Systems and methods for signaling hypothetical reference decoder parameters in a parameter set

ABSTRACT

Techniques and systems are provided for encoding and decoding video data. For example, a method of encoding video data includes generating an encoded video bitstream comprising multiple layers. The encoded video bitstream includes a video parameter set defining parameters of the encoded video bitstream. The video parameter set includes video usability information. The method further includes determining whether timing information is signaled in the video usability information of the video parameter set. The method further includes determining whether to signal hypothetical reference decoder parameters in the video usability information of the video parameter set based on whether timing information is signaled in the video usability information.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.14/743,556, filed Jun. 18, 2015, and which claims the benefit of U.S.Provisional Application No. 62/015,285, filed Jun. 20, 2014, both ofwhich are hereby incorporated by reference, in their entirety. Thisapplication is related to U.S. application Ser. No. 14/743,434 (AttorneyDocket No. 145689U1), titled “SYSTEMS AND METHODS FOR SIGNALINGINFORMATION FOR LAYER SETS IN A PARAMETER SET,” filed on June 18, 2015,and U.S. application Ser. No. 14/743,613 (Attorney Docket No. 145689U3),titled “SYSTEMS AND METHODS FOR SELECTIVELY SIGNALING DIFFERENT NUMBERSOF VIDEO SIGNAL INFORMATION SYNTAX STRUCTURES IN A PARAMETER SET,” filedon Jun. 18, 2015, both of which are hereby incorporated herein byreference, in their entirety.

FIELD

The present disclosure generally relates to video coding, and morespecifically to techniques and systems for signaling hypotheticalreference decoder parameters in a parameter set.

BACKGROUND

Many devices and systems allow video data to be processed and output forconsumption. Digital video data includes large amounts of data to meetthe demands of consumers and video providers. For example, consumers ofvideo data desire video of the utmost quality, with high fidelity,resolutions, frame rates, and the like. As a result, the large amount ofvideo data that is required to meet these demands places a burden oncommunication networks and devices that process and store the videodata.

Various video coding techniques may be used to compress video data.Video coding is performed according to one or more video codingstandards. For example, video coding standards include high efficiencyvideo coding (HEVC), advanced video coding (AVC), moving picture expertsgroup (MPEG) coding, or the like. Video coding generally utilizesprediction methods (e.g., inter-prediction, intra-prediction, or thelike) that take advantage of redundancy present in video images orsequences. An important goal of video coding techniques is to compressvideo data into a form that uses a lower bit rate, while avoiding orminimizing degradations to video quality. With ever-evolving videoservices becoming available, encoding techniques with better codingefficiency are needed.

BRIEF SUMMARY

In some embodiments, techniques and systems are described for signalinginformation for layer sets in a parameter set. A layer set includes aset of layers of a bitstream that are self-contained so that the layersin a given layer set can form an independent bitstream representingvideo content. The parameter set may include a video parameter set. Theparameter set may be provided with an encoded video bitstream and maydefine parameters of the encoded video bitstream. One or more layer setsmay be defined in a base part of the parameter set, and one or moreadditional layer sets not defined in the base part may be defined in anextension part of the parameter set. The base part of the parameter setmay be defined in an initial edition of a video coding standard (e.g., afirst edition of the high efficiency video coding standard, or othercoding standard), and the extension part of the parameter set may bedefined in a later edition of the video coding standard. The base andextension parts of the parameter set may include signaling informationdescribing characteristics of one or more layer sets (including theadditional layer sets). For example, the signaling information maydescribe rate information (e.g., bit rate information, picture rateinformation, or other rate information) for the one or more layer sets(including the additional layer sets). In another example, the signalinginformation may include information indicating whether a layer in alayer set is a target output layer of an output layer set. Embodimentsare described herein for signaling such information for all layer setsdefined in the base and extension parts of the parameter set.

According to at least one example of for signaling information in aparameter set for layer sets, a method of encoding video data isprovided that includes generating an encoded video bitstream with one ormore layer sets and one or more additional layer sets. Each of a layerset and an additional layer set includes one or more layers, and theencoded video bitstream includes a video parameter set definingparameters of the encoded video bitstream. The one or more layer setsare defined in a base part of the video parameter set, and the one ormore additional layer sets are defined in an extension part of the videoparameter set. The method further includes providing, in the videoparameter set, one or more syntax elements for signaling informationrelated to the one or more layer sets and the one or more additionallayer sets. The information includes rate information for the one ormore layer sets defined in the base part of the video parameter set andfor the one or more additional layer sets defined in the extension partof the video parameter set.

In another example, an apparatus is provided that includes a memoryconfigured to store video data and a processor. The processor isconfigured to and may generate, from the video data, an encoded videobitstream comprising one or more layer sets and one or more additionallayer sets. Each of a layer set and an additional layer set includes oneor more layers, and the encoded video bitstream includes a videoparameter set defining parameters of the encoded video bitstream. Theone or more layer sets are defined in a base part of the video parameterset, and the one or more additional layer sets are defined in anextension part of the video parameter set. The processor is furtherconfigured to and may provide, in the video parameter set, one or moresyntax elements for signaling information related to the one or morelayer sets and the one or more additional layer sets. The informationincludes rate information for the one or more layer sets defined in thebase part of the video parameter set and for the one or more additionallayer sets defined in the extension part of the video parameter set.

In another example, a computer readable medium is provided having storedthereon instructions that when executed by a processor perform a methodthat includes: generating an encoded video bitstream comprising one ormore layer sets and one or more additional layer sets, wherein each of alayer set and an additional layer set includes one or more layers, theencoded video bitstream including a video parameter set definingparameters of the encoded video bitstream, wherein the one or more layersets are defined in a base part of the video parameter set, and whereinthe one or more additional layer sets are defined in an extension partof the video parameter set; and providing, in the video parameter set,one or more syntax elements for signaling information related to the oneor more layer sets and the one or more additional layer sets, theinformation including rate information for the one or more layer setsdefined in the base part of the video parameter set and for the one ormore additional layer sets defined in the extension part of the videoparameter set.

In another example, an apparatus is provided that includes means forgenerating an encoded video bitstream comprising one or more layer setsand one or more additional layer sets.

Each of a layer set and an additional layer set includes one or morelayers, and the encoded video bitstream includes a video parameter setdefining parameters of the encoded video bitstream. The one or morelayer sets are defined in a base part of the video parameter set, andthe one or more additional layer sets are defined in an extension partof the video parameter set. The apparatus further includes means forproviding, in the video parameter set, one or more syntax elements forsignaling information related to the one or more layer sets and the oneor more additional layer sets. The information includes rate informationfor the one or more layer sets defined in the base part of the videoparameter set and for the one or more additional layer sets defined inthe extension part of the video parameter set.

In another example of signaling information for layer sets in aparameter set, a method of decoding video data is provided that includesobtaining an encoded video bitstream comprising one or more layer setsand one or more additional layer sets. Each of a layer set and anadditional layer set includes one or more layers, and the encoded videobitstream includes a video parameter set defining parameters of theencoded video bitstream. The one or more layer sets are defined in abase part of the video parameter set, and the one or more additionallayer sets are defined in an extension part of the video parameter set.The method further includes decoding one or more syntax elements fromthe video parameter set. The one or more syntax elements include rateinformation for the one or more layer sets defined in the base part ofthe video parameter set and for the one or more additional layer setsdefined in the extension part of the video parameter set.

In another example, an apparatus is provided that includes a memoryconfigured to store video data and a processor. The processor isconfigured to and may obtain an encoded video bitstream comprising oneor more layer sets and one or more additional layer sets. Each of alayer set and an additional layer set includes one or more layers, andthe encoded video bitstream includes a video parameter set definingparameters of the encoded video bitstream. The one or more layer setsare defined in a base part of the video parameter set, and the one ormore additional layer sets are defined in an extension part of the videoparameter set. The processor is further configured to and may decode oneor more syntax elements from the video parameter set. The one or moresyntax elements include rate information for the one or more layer setsdefined in the base part of the video parameter set and for the one ormore additional layer sets defined in the extension part of the videoparameter set.

In another example, a computer readable medium is provided having storedthereon instructions that when executed by a processor perform a methodthat includes: obtaining an encoded video bitstream comprising one ormore layer sets and one or more additional layer sets, wherein each of alayer set and an additional layer set includes one or more layers, theencoded video bitstream including a video parameter set definingparameters of the encoded video bitstream, wherein the one or more layersets are defined in a base part of the video parameter set, and whereinthe one or more additional layer sets are defined in an extension partof the video parameter set; and decoding one or more syntax elementsfrom the video parameter set, the one or more syntax elements includingrate information for the one or more layer sets defined in the base partof the video parameter set and for the one or more additional layer setsdefined in the extension part of the video parameter set.

In another example, an apparatus is provided that includes means forobtaining an encoded video bitstream comprising one or more layer setsand one or more additional layer sets. Each of a layer set and anadditional layer set includes one or more layers, and the encoded videobitstream includes a video parameter set defining parameters of theencoded video bitstream. The one or more layer sets are defined in abase part of the video parameter set, and the one or more additionallayer sets are defined in an extension part of the video parameter set.The apparatus further includes means for one or more syntax elementsfrom the video parameter set. The one or more syntax elements includerate information for the one or more layer sets defined in the base partof the video parameter set and for the one or more additional layer setsdefined in the extension part of the video parameter set.

In some aspects, different rate information is signaled for eachdifferent layer set of the one or more layer sets and the one or moreadditional layer sets. In some aspects, the rate information includesbit rate information. In some aspects, the rate information includespicture rate information.

In some aspects, the one or more syntax elements in the video parameterset include a flag, the flag indicating whether bit rate information isavailable for an additional layer set. In some aspects, the one or moresyntax elements in the video parameter set include a flag, the flagindicating whether picture rate information is available for anadditional layer set. In some aspects, the one or more syntax elementsin the video parameter set include a syntax element, the syntax elementindicating an average bit rate for an additional layer set. In someexamples, the one or more syntax elements in the video parameter setinclude a syntax element, the syntax element indicating a maximum bitrate for an additional layer set.

In some aspects, the one or more syntax elements in the video parameterset include a syntax element, the syntax element indicating whether anadditional layer set has a constant picture rate. In some aspects, theone or more syntax elements in the video parameter set include a syntaxelement, the syntax element indicating an average picture rate for anadditional layer set. In some aspects, the one or more syntax elementsin the video parameter set include a flag, the flag indicating whether alayer in an additional layer set is a target output layer of an outputlayer set.

In some embodiments, techniques and systems are described for signalinghypothetical reference decoder (HRD) parameters in a parameter set inonly certain conditions. In some examples, sets of hypotheticalreference decoder parameters may be provided in a parameter set and usedto check that a bitstream or a sub-bitstream can be properly decoded.For example, the hypothetical reference decoder parameters may besignaled in a video usability information (VUI) part of a videoparameter set (VPS), or the VPS VUI. The signaling of the hypotheticalreference decoder parameters in the VPS VUI may be controlled by agating flag. For example, hypothetical reference decoder parameters maynot be signaled in the VPS VUI when a value of the gating flag is set to0 in some examples, or 1 in other examples. Embodiments are describedherein for signaling hypothetical reference decoder parameters in theVPS VUI when certain information is signaled in the VPS or the VPS VUI.For example, hypothetical reference decoder parameters may be signaledin the VPS VUI when timing information is also signaled in the VPS orthe VPS VUI. Similarly, hypothetical reference decoder parameters maynot be signaled in the VPS VUI when no timing information is signaled inthe VPS or the VPS VUI. In some aspects, an encoder (or other device,such as an editor, splicer, or the like) may condition the value of thegating flag to be dependent on a value of a syntax element thatindicates whether timing information is present in the VPS or the VPSVUI. For example, when the syntax element is set to a value (e.g., 0or 1) indicating that no timing information is present, the gating flagmay not be signaled and thus inferred to be a certain value indicatingthat no hypothetical reference decoder parameters are to be signaled. Inanother example when the syntax element is set to a value indicatingthat no timing information is present, the gating flag may be signaledwith the flag set to the certain value.

According to at least one example of signaling hypothetical referencedecoder parameters in a parameter set, a method of encoding video datais provided that includes generating an encoded video bitstreamcomprising multiple layers. The encoded video bitstream includes a videoparameter set defining parameters of the encoded video bitstream. Thevideo parameter set includes video usability information. The methodfurther includes determining whether timing information is signaled inthe video usability information of the video parameter set. The methodfurther includes determining whether to signal hypothetical referencedecoder parameters in the video usability information of the videoparameter set based on whether timing information is signaled in thevideo usability information

In another example, an apparatus is provided that includes a memoryconfigured to store video data and a processor. The processor isconfigured to and may generate, from the video data, an encoded videobitstream comprising multiple layers. The encoded video bitstreamincludes a video parameter set defining parameters of the encoded videobitstream. The video parameter set includes video usability information.The processor is further configured to and may determine whether timinginformation is signaled in the video usability information of the videoparameter set. The processor is further configured to and may determinewhether to signal hypothetical reference decoder parameters in the videousability information of the video parameter set based on whether timinginformation is signaled in the video usability information.

In another example, a computer readable medium is provided having storedthereon instructions that when executed by a processor perform a methodthat includes: generating an encoded video bitstream comprising multiplelayers, the encoded video bitstream including a video parameter setdefining parameters of the encoded video bitstream, wherein the videoparameter set includes video usability information; determining whethertiming information is signaled in the video usability information of thevideo parameter set; and determining whether to signal hypotheticalreference decoder parameters in the video usability information of thevideo parameter set based on whether timing information is signaled inthe video usability information.

In another example, an apparatus is provided that includes means forgenerating an encoded video bitstream comprising multiple layers. Theencoded video bitstream includes a video parameter set definingparameters of the encoded video bitstream. The video parameter setincludes video usability information. The apparatus further includesmeans for determining whether timing information is signaled in thevideo usability information of the video parameter set. The apparatusfurther includes means for determining whether to signal hypotheticalreference decoder parameters in the video usability information of thevideo parameter set based on whether timing information is signaled inthe video usability information.

The method, apparatuses, and computer readable medium described abovefor signaling hypothetical reference decoder parameters in a parameterset may further include signaling the hypothetical reference decoderparameters in the video usability information when timing information issignaled in the video usability information. The method, apparatuses,and computer readable medium described above for signaling hypotheticalreference decoder parameters in a parameter set may further include notsignaling the hypothetical reference decoder parameters in the videousability information when timing information is not signaled in thevideo usability information.

In some aspects, determining whether the timing information is signaledin the video usability information of the video parameter set includesdetermining a value of a first flag in the video usability information,the first flag indicating whether the timing information is signaled inthe video usability information.

The method, apparatuses, and computer readable medium described abovefor signaling hypothetical reference decoder parameters in a parameterset may further include determining a value of a second flag in thevideo usability information based on the value of the first flag, thesecond flag defining whether hypothetical reference decoder parametersare signaled in the video usability information.

The method, apparatuses, and computer readable medium described abovefor signaling hypothetical reference decoder parameters in a parameterset may further include providing, in the video usability information,one or more syntax elements for signaling information related to theencoded video bitstream, the information including a condition that thevalue of the second flag is dependent on the value of the first flag.

The method, apparatuses, and computer readable medium described abovefor signaling hypothetical reference decoder parameters in a parameterset may further include providing, in the video usability information,one or more syntax elements for signaling information related to theencoded video bitstream, the information including a constraint that thevalue of the second flag is to be set to zero when the value of thefirst flag is equal to zero.

In some aspects, the method is executable on a wireless communicationdevice. The wireless communication device comprises a memory configuredto store the video data and a processor configured to executeinstructions to process the video data stored in the memory. Thewireless communication device further comprises a transmitter configuredto transmit the encoded video bitstream including the video parameterset. In some aspects, the wireless communication device is a cellulartelephone and the encoded video bitstream is modulated according to acellular communication standard.

In some aspects, the apparatus is a wireless communication device. Thewireless communication device comprises a transmitter configured totransmit the encoded video bitstream including the video parameter set.In some aspects, the wireless communication device is a cellulartelephone and the encoded video bitstream is modulated according to acellular communication standard.

In some embodiments, techniques and systems are described forselectively signaling different numbers of video signal informationsyntax structures in a parameter set. In some examples, an encoder thatencodes video data according to a first coding protocol may generate anencoded video bitstream. The encoder may provide the encoded videobitstream to a decoder in a receiving device. A base layer for videodata may be provided to the decoder (or another decoder in the samereceiving device) by an external source other than the encoder that usesthe first coding protocol. For example, the base layer may be encodedaccording to a second coding protocol that is different than the firstcoding protocol. In such an example, an encoder that encodes video datausing the second coding protocol may provide the base layer to thereceiving device. A video signal information syntax structure issignaled for each layer of a multi-layer encoded video bitstream, with aseparate video signal information syntax structure being signaled foreach layer. In some cases, a number of video signal information syntaxstructures to include in a parameter set (e.g., video parameter set) isnot signaled. In such cases, the number of video signal informationsyntax structures may be inferred to be equal to the number of layers inthe encoded video bitstream. Embodiments are described herein fordetermining a number of video signal information syntax structures tosignal in the parameter set based on whether the base layer is includedin the encoded video bitstream or to be provided to the receiving devicefrom the external source.

According to at least one example of selectively signaling differentnumbers of video signal information syntax structures in a parameterset, a method of encoding video data is provided that includesgenerating an encoded video bitstream according to a first codingprotocol. The encoded video bitstream includes one or more enhancementlayers and a video parameter set defining parameters of the encodedvideo bitstream. The method further includes determining that a syntaxstructure indicative of the number of video signal information syntaxstructures provided in the encoded video bitstream is not present in thevideo parameter set. The method further includes determining the numberof video signal information syntax structures to include in the videoparameter set when the syntax structure indicative of the number ofvideo signal information syntax structures provided in the encoded videobitstream is not present in the video parameter set. The number isdetermined as a first value or a second value based on whether a baselayer is included in the encoded video bitstream or to be provided to adecoder from an external source.

In another example, an apparatus is provided that includes a memoryconfigured to store video data and a processor. The processor isconfigured to and may generate, from the video data, an encoded videobitstream according to a first coding protocol. The encoded videobitstream includes one or more enhancement layers and a video parameterset defining parameters of the encoded video bitstream. The processor isfurther configured to and may determine that a syntax structureindicative of the number of video signal information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set. The processor is further configured to and may determinethe number of video signal information syntax structures to include inthe video parameter set when the syntax structure indicative of thenumber of video signal information syntax structures provided in theencoded video bitstream is not present in the video parameter set. Thenumber is determined as a first value or a second value based on whethera base layer is included in the encoded video bitstream or to beprovided to a decoder from an external source.

In another example, a computer readable medium is provided having storedthereon instructions that when executed by a processor perform a methodthat includes: generating an encoded video bitstream according to afirst coding protocol, the encoded video bitstream including one or moreenhancement layers and a video parameter set defining parameters of theencoded video bitstream; determining that a syntax structure indicativeof the number of video signal information syntax structures provided inthe encoded video bitstream is not present in the video parameter set;and determining the number of video signal information syntax structuresto include in the video parameter set when the syntax structureindicative of the number of video signal information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set, wherein the number is determined as a first value or asecond value based on whether a base layer is included in the encodedvideo bitstream or to be provided to a decoder from an external source

In another example, an apparatus is provided that includes means forgenerating an encoded video bitstream according to a first codingprotocol. The encoded video bitstream includes one or more enhancementlayers and a video parameter set defining parameters of the encodedvideo bitstream. The apparatus further includes means for determiningthat a syntax structure indicative of the number of video signalinformation syntax structures provided in the encoded video bitstream isnot present in the video parameter set. The apparatus further includesmeans for determining the number of video signal information syntaxstructures to include in the video parameter set when the syntaxstructure indicative of the number of video signal information syntaxstructures provided in the encoded video bitstream is not present in thevideo parameter set. The number is determined as a first value or asecond value based on whether a base layer is included in the encodedvideo bitstream or to be provided to a decoder from an external source.

In some aspects, the number of video signal information syntaxstructures to include in the video parameter set is determined as thefirst value when it is determined that the base layer is included in theencoded video bitstream, wherein the first value is equal to a maximumnumber of layers of the encoded video bitstream.

In some aspects, the number of video signal information syntaxstructures to include in the video parameter set is determined as thesecond value when it is determined that the base layer is to be providedto the decoder from the external source, wherein the second value isequal to a maximum number of layers of the encoded video bitstream minusone.

In some aspects, a video signal information syntax structure is assignedto each of the layers included in the encoded video bitstream, and novideo signal information syntax structure is assigned to the base layerthat is to be provided to the decoder from the external source.

In some aspects, the base layer provided from the external source isencoded according to a second coding protocol, the second codingprotocol being different than the first coding protocol. In someexamples, the first coding protocol includes a high efficiency videocoding protocol, and the second coding protocol includes an advancedvideo coding protocol.

In another example of selectively signaling different numbers of videosignal information syntax structures in a parameter set, a method ofdecoding video data is provided that includes accessing an encoded videobitstream encoded according to a first coding protocol. The encodedvideo bitstream includes one or more enhancement layers and a videoparameter set defining parameters of the encoded video bitstream. Themethod further includes determining that a syntax structure indicativeof the number of video signal information syntax structures provided inthe encoded video bitstream is not present in the video parameter set.The method further includes determining whether a base layer is includedin the encoded video bitstream or to be received from an externalsource. The method further includes determining the number of videosignal information syntax structures included in the video parameter setto be a first value or a second value based on whether the base layer isincluded in the encoded video bitstream or to be received from theexternal source.

In another example, an apparatus is provided that includes a memoryconfigured to store video data and a processor. The processor isconfigured to and may access an encoded video bitstream encodedaccording to a first coding protocol. The encoded video bitstreamincludes one or more enhancement layers and a video parameter setdefining parameters of the encoded video bitstream. The processor isfurther configured to and may determine that a syntax structureindicative of the number of video signal information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set. The processor is further configured to and may determinewhether a base layer is included in the encoded video bitstream or to bereceived from an external source. The processor is further configured toand may determine the number of video signal information syntaxstructures included in the video parameter set to be a first value or asecond value based on whether the base layer is included in the encodedvideo bitstream or to be received from the external source.

In another example, a computer readable medium is provided having storedthereon instructions that when executed by a processor perform a methodthat includes: accessing an encoded video bitstream encoded according toa first coding protocol, the encoded video bitstream including one ormore enhancement layers and a video parameter set defining parameters ofthe encoded video bitstream; determining that a syntax structureindicative of the number of video signal information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set; determining whether a base layer is included in theencoded video bitstream or to be received from an external source; anddetermining the number of video signal information syntax structuresincluded in the video parameter set to be a first value or a secondvalue based on whether the base layer is included in the encoded videobitstream or to be received from the external source.

In another example, an apparatus is provided that includes means foraccessing an encoded video bitstream encoded according to a first codingprotocol. The encoded video bitstream includes one or more enhancementlayers and a video parameter set defining parameters of the encodedvideo bitstream. The apparatus further includes means for determiningthat a syntax structure indicative of the number of video signalinformation syntax structures provided in the encoded video bitstream isnot present in the video parameter set. The apparatus further includesmeans for determining whether a base layer is included in the encodedvideo bitstream or to be received from an external source. The apparatusfurther includes means for determining the number of video signalinformation syntax structures included in the video parameter set to bea first value or a second value based on whether the base layer isincluded in the encoded video bitstream or to be received from theexternal source.

The method, apparatuses, and computer readable medium described abovefor selectively signaling different numbers of video signal informationsyntax structures in a parameter set may further include determining thenumber of video signal information syntax structures to be the firstvalue when it is determined that the base layer is included in theencoded video bitstream, wherein the first value is equal to a maximumnumber of layers of the encoded video bitstream.

The method, apparatuses, and computer readable medium described abovefor selectively signaling different numbers of video signal informationsyntax structures in a parameter set may further include determining thenumber of video signal information syntax structures to be the secondvalue when it is determined that the base layer is to be received fromthe external source, wherein the second value is equal to a maximumnumber of layers of the encoded video bitstream minus one.

In some aspects, a video signal information syntax structure is assignedto each of the layers included in the encoded video bitstream, and novideo signal information syntax structure is assigned to the base layerthat is to be received from the external source.

In some aspects, the base layer provided from the external source isencoded according to a second coding protocol, the second codingprotocol being different than the first coding protocol. In someaspects, the first coding protocol includes a high efficiency videocoding protocol, and the second coding protocol includes an advancedvideo coding protocol.

This summary is not intended to identify key or essential features ofthe claimed subject matter, nor is it intended to be used in isolationto determine the scope of the claimed subject matter. The subject mattershould be understood by reference to appropriate portions of the entirespecification of this patent, any or all drawings, and each claim.

The foregoing, together with other features and embodiments, will becomemore apparent upon referring to the following specification, claims, andaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Illustrative embodiments of the present invention are described indetail below with reference to the following drawing figures:

FIG. 1 is a block diagram illustrating an example of an encoding deviceand a decoding device, in accordance with some embodiments.

FIG. 2 is a block diagram illustrating an example of layer sets definedin a base part and an extension part of a parameter set, in accordancewith some embodiments.

FIG. 3 is an example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 4 is another example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 5 is another example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 6 is another example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 7 is a flowchart illustrating an embodiment of a process ofencoding video data for signaling information for layer sets in aparameter set, in accordance with some embodiments.

FIG. 8 is a flowchart illustrating an embodiment of a process ofdecoding video data including signaled information for layer sets in aparameter set, in accordance with some embodiments.

FIG. 9A is another example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 9B is another example of a syntax structure of a parameter set, inaccordance with some embodiments.

FIG. 10 is a flowchart illustrating an embodiment of a process ofencoding video data for signaling hypothetical reference decoderparameters in a parameter set, in accordance with some embodiments.

FIG. 11 is a block diagram illustrating an environment with an encodingdevice for providing encoded video data with multiple layers, inaccordance with some embodiments.

FIG. 12 is a block diagram illustrating an environment with multipleencoding devices for providing encoded video data with multiple layers,in accordance with some embodiments.

FIG. 13 is an example of a parameter set with video signal informationfor multiple layers of encoded video data, in accordance with someembodiments.

FIG. 14 is a flowchart illustrating an embodiment of a process ofencoding video data for selectively signaling different numbers of videosignal information syntax structures in a parameter set, in accordancewith some embodiments.

FIG. 15 is a flowchart illustrating an embodiment of a process ofdecoding video data for inferring different numbers of video signalinformation syntax structures in a parameter set, in accordance withsome embodiments.

FIG. 16 is a block diagram illustrating an example video encodingdevice, in accordance with some embodiments.

FIG. 17 is a block diagram illustrating an example video decodingdevice, in accordance with some embodiments.

DETAILED DESCRIPTION

Certain aspects and embodiments of this disclosure are provided below.Some of these aspects and embodiments may be applied independently andsome of them may be applied in combination as would be apparent to thoseof skill in the art. In the following description, for the purposes ofexplanation, specific details are set forth in order to provide athorough understanding of embodiments of the invention. However, it willbe apparent that various embodiments may be practiced without thesespecific details. The figures and description are not intended to berestrictive.

The ensuing description provides exemplary embodiments only, and is notintended to limit the scope, applicability, or configuration of thedisclosure. Rather, the ensuing description of the exemplary embodimentswill provide those skilled in the art with an enabling description forimplementing an exemplary embodiment. It should be understood thatvarious changes may be made in the function and arrangement of elementswithout departing from the spirit and scope of the invention as setforth in the appended claims.

Specific details are given in the following description to provide athorough understanding of the embodiments. However, it will beunderstood by one of ordinary skill in the art that the embodiments maybe practiced without these specific details. For example, circuits,systems, networks, processes, and other components may be shown ascomponents in block diagram form in order not to obscure the embodimentsin unnecessary detail. In other instances, well-known circuits,processes, algorithms, structures, and techniques may be shown withoutunnecessary detail in order to avoid obscuring the embodiments.

Also, it is noted that individual embodiments may be described as aprocess which is depicted as a flowchart, a flow diagram, a data flowdiagram, a structure diagram, or a block diagram. Although a flowchartmay describe the operations as a sequential process, many of theoperations can be performed in parallel or concurrently. In addition,the order of the operations may be re-arranged. A process is terminatedwhen its operations are completed, but could have additional steps notincluded in a figure. A process may correspond to a method, a function,a procedure, a subroutine, a subprogram, etc. When a process correspondsto a function, its termination can correspond to a return of thefunction to the calling function or the main function.

The term “computer-readable medium” includes, but is not limited to,portable or non-portable storage devices, optical storage devices, andvarious other mediums capable of storing, containing, or carryinginstruction(s) and/or data. A computer-readable medium may include anon-transitory medium in which data can be stored and that does notinclude carrier waves and/or transitory electronic signals propagatingwirelessly or over wired connections. Examples of a non-transitorymedium may include, but are not limited to, a magnetic disk or tape,optical storage media such as compact disk (CD) or digital versatiledisk (DVD), flash memory, memory or memory devices. A computer-readablemedium may have stored thereon code and/or machine-executableinstructions that may represent a procedure, a function, a subprogram, aprogram, a routine, a subroutine, a module, a software package, a class,or any combination of instructions, data structures, or programstatements. A code segment may be coupled to another code segment or ahardware circuit by passing and/or receiving information, data,arguments, parameters, or memory contents. Information, arguments,parameters, data, etc. may be passed, forwarded, or transmitted via anysuitable means including memory sharing, message passing, token passing,network transmission, or the like.

Furthermore, embodiments may be implemented by hardware, software,firmware, middleware, microcode, hardware description languages, or anycombination thereof. When implemented in software, firmware, middlewareor microcode, the program code or code segments to perform the necessarytasks (e.g., a computer-program product) may be stored in acomputer-readable or machine-readable medium. A processor(s) may performthe necessary tasks.

Several systems and methods of video coding using video encoders anddecoders are described herein. For example, one or more systems andmethods of coding are directed to improving the signaling of differentinformation in a parameter set, such as the video parameter set (VPS)described in the high efficiency video coding (HEVC) standard.

As more devices and systems provide consumers with the ability toconsume digital video data, the need for efficient video codingtechniques becomes more important. Video coding is needed to reducestorage and transmission requirements necessary to handle the largeamounts of data present in digital video data. Various video codingtechniques may be used to compress video data into a form that uses alower bit rate while maintaining high video quality.

FIG. 1 is a block diagram illustrating an example of a system 100including an encoding device 104 and a decoding device 112. The encodingdevice 104 may be part of a source device, and the decoding device 112may be part of a receiving device. The source device and/or thereceiving device may include an electronic device, such as a mobile orstationary telephone handset (e.g., smartphone, cellular telephone, orthe like), a desktop computer, a laptop or notebook computer, a tabletcomputer, a set-top box, a television, a camera, a display device, adigital media player, a video gaming console, a video streaming device,or any other suitable electronic device. In some examples, the sourcedevice and the receiving device may include one or more wirelesstransceivers for wireless communications. The coding techniquesdescribed herein are applicable to video coding in various multimediaapplications, including streaming video transmissions (e.g., over theInternet), television broadcasts or transmissions, encoding of digitalvideo for storage on a data storage medium, decoding of digital videostored on a data storage medium, or other applications. In someexamples, system 100 can support one-way or two-way video transmissionto support applications such as video conferencing, video streaming,video playback, video broadcasting, gaming, and/or video telephony.

The encoding device 104 (or encoder) can be used to encode video datausing a video coding standard or protocol to generate an encoded videobitstream. Video coding standards include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IECMPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC),including its Scalable Video Coding (SVC) and Multiview Video Coding(MVC) extensions. A more recent video coding standard, High EfficiencyVideo Coding (HEVC), has been finalized by the Joint Collaboration Teamon Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) andISO/IEC Moving Picture Experts Group (MPEG). Various extensions to HEVCdeal with multi-layer video coding and are also being developed by theJCT-VC, including the multiview extension to HEVC, called MV-HEVC, andthe scalable extension to HEVC, called SHVC, or any other suitablecoding protocol.

Many embodiments described herein describe examples using the HEVCstandard, or extensions thereof. However, the techniques and systemsdescribed herein may also be applicable to other coding standards, suchas AVC, MPEG, extensions thereof, or other suitable coding standards.Accordingly, while the techniques and systems described herein may bedescribed with reference to a particular video coding standard, one ofordinary skill in the art will appreciate that the description shouldnot be interpreted to apply only to that particular standard.

A video source 102 may provide the video data to the encoding device104. The video source 102 may be part of the source device, or may bepart of a device other than the source device. The video source 102 mayinclude a video capture device (e.g., a video camera, a camera phone, avideo phone, or the like), a video archive containing stored video, avideo server or content provider providing video data, a video feedinterface receiving video from a video server or content provider, acomputer graphics system for generating computer graphics video data, acombination of such sources, or any other suitable video source.

The video data from the video source 102 may include one or more inputpictures or frames. A picture or frame is a still image that is part ofa video. The encoder engine 106 (or encoder) of the encoding device 104encodes the video data to generate an encoded video bitstream. An HEVCbitstream, for example, may include a sequence of data units callednetwork abstraction layer (NAL) units. Two classes of NAL units exist inthe HEVC standard, including video coding layer (VCL) NAL units andnon-VCL NAL units. A VCL NAL unit includes one slice or slice segment(described below) of coded picture data, and a non-VCL NAL unit includescontrol information that relates to multiple coded pictures. A codedpicture and non-VCL NAL units (if any) corresponding to the codedpicture is called an access unit (AU).

NAL units may contain a sequence of bits forming a coded representationof the video data (the encoded video bitstream), such as codedrepresentations of pictures in a video. The encoder engine 106 generatescoded representations of pictures by partitioning each picture intomultiple slices. A slice is independent of other slices so thatinformation in the slice is coded without dependency on data from otherslices within the same picture. A slice includes one or more slicesegments including an independent slice segment and, if present, one ormore dependent slice segments that depend on previous slice segments.The slices are then partitioned into coding tree blocks (CTBs) of lumasamples and chroma samples. A CTB of luma samples and one or more CTBsof chroma samples, along with syntax for the samples, are referred to asa coding tree unit (CTU). A CTU is the basic processing unit for HEVCencoding. A CTU can be split into multiple coding units (CUs) of varyingsizes. A CU contains luma and chroma sample arrays that are referred toas coding blocks (CBs).

The luma and chroma CBs can be further split into prediction blocks(Pubs). A PB is a block of samples of the luma or a chroma componentthat uses the same motion parameters for inter-prediction. The luma PBand one or more chroma Pubs, together with associated syntax, form aprediction unit (PU). A set of motion parameters is signaled in thebitstream for each PU and is used for inter-prediction of the luma PBand the one or more chroma Pubs. A CB can also be partitioned into oneor more transform blocks (TBs). A TB represents a square block ofsamples of a color component on which the same two-dimensional transformis applied for coding a prediction residual signal. A transform unit(TU) represents the TBs of luma and chroma samples, and correspondingsyntax elements.

A size of a CU corresponds to a size of the coding node and is square inshape. For example, a size of a CU may be 8×8 samples, 16×16 samples,32×32 samples, 64×64 samples, or any other appropriate size up to thesize of the corresponding CTU. The phrase “N×N” is used herein to referto pixel dimensions of a video block in terms of vertical and horizontaldimensions (e.g., 8 pixels×8 pixels). The pixels in a block may bearranged in rows and columns. In some embodiments, blocks may not havethe same number of pixels in a horizontal direction as in a verticaldirection. Syntax data associated with a CU may describe, for example,partitioning of the CU into one or more PUs. Partitioning modes maydiffer between whether the CU is intra-prediction mode encoded orinter-prediction mode encoded. PUs may be partitioned to be non-squarein shape. Syntax data associated with a CU may also describe, forexample, partitioning of the CU into one or more TUs according to a CTU.A TU can be square or non-square in shape.

According to the HEVC standard, transformations may be performed usingtransform units (TUs). TUs may vary for different CUs. The TUs may besized based on the size of PUs within a given CU. The TUs may be thesame size or smaller than the PUs. In some examples, residual samplescorresponding to a CU may be subdivided into smaller units using aquadtree structure known as residual quad tree (RQT). Leaf nodes of theRQT may correspond to TUs. Pixel difference values associated with theTUs may be transformed to produce transform coefficients. The transformcoefficients may then be quantized by the encoder engine 106.

Once the pictures of the video data are partitioned into CUs, theencoder engine 106 predicts each PU using a prediction mode. Theprediction is then subtracted from the original video data to getresiduals (described below). For each CU, a prediction mode may besignaled inside the bitstream using syntax data. A prediction mode mayinclude intra-prediction (or intra-picture prediction) orinter-prediction (or inter-picture prediction). Using intra-prediction,each PU is predicted from neighboring image data in the same pictureusing, for example, DC prediction to find an average value for the PU,planar prediction to fit a plan surface to the PU, direction predictionto extrapolate from neighboring data, or any other suitable types ofprediction. Using inter-prediction, each PU is predicted using motioncompensation prediction from image data in one or more referencepictures (before or after the current picture in output order).

A PU may include data related to the prediction process. For example,when the PU is encoded using intra-prediction, the PU may include datadescribing an intra-prediction mode for the PU. As another example, whenthe PU is encoded using inter-prediction, the PU may include datadefining a motion vector for the PU. The data defining the motion vectorfor a PU may describe, for example, a horizontal component of the motionvector, a vertical component of the motion vector, a resolution for themotion vector (e.g., one-quarter pixel precision or one-eighth pixelprecision), a reference picture to which the motion vector points,and/or a reference picture list (e.g., List 0, List 1, or List C) forthe motion vector.

The encoding device 104 may then perform transformation andquantization. For example, following prediction, the encoder engine 106may calculate residual values corresponding to the PU. Residual valuesmay comprise pixel difference values. Any residual data that may beremaining after prediction is performed is transformed using a blocktransform, which may be based on discrete cosine transform, discretesine transform, an integer transform, a wavelet transform, or othersuitable transform function. In some cases, one or more block transforms(e.g., sizes 32×32, 16×16, 8×8, 4×4, or the like) may be applied toresidual data in each CU. In some embodiments, a TU may be used for thetransform and quantization processes implemented by the encoder engine106. A given CU having one or more PUs may also include one or more TUs.As described in further detail below, the residual values may betransformed into transform coefficients using the block transforms, andthen may be quantized and scanned using TUs to produce serializedtransform coefficients for entropy coding.

In some embodiments following intra-predictive or inter-predictivecoding using PUs of a CU, the encoder engine 106 may calculate residualdata for the TUs of the CU. The PUs may comprise pixel data in thespatial domain (or pixel domain). The TUs may comprise coefficients inthe transform domain following application of a block transform. Aspreviously noted, the residual data may correspond to pixel differencevalues between pixels of the unencoded picture and prediction valuescorresponding to the PUs. Encoder engine 106 may form the TUs includingthe residual data for the CU, and may then transform the TUs to producetransform coefficients for the CU.

The encoder engine 106 may perform quantization of the transformcoefficients. Quantization provides further compression by quantizingthe transform coefficients to reduce the amount of data used torepresent the coefficients. For example, quantization may reduce the bitdepth associated with some or all of the coefficients. In one example, acoefficient with an n-bit value may be rounded down to an m-bit valueduring quantization, with n being greater than m.

Once quantization is performed, the coded bitstream includes quantizedtransform coefficients, prediction information (e.g., prediction modes,motion vectors, or the like), partitioning information, and any othersuitable data, such as other syntax data. The different elements of thecoded bitstream may then be entropy encoded by the encoder engine 106.In some examples, the encoder engine 106 may utilize a predefined scanorder to scan the quantized transform coefficients to produce aserialized vector that can be entropy encoded. In some examples, encoderengine 106 may perform an adaptive scan. After scanning the quantizedtransform coefficients to form a one-dimensional vector, the encoderengine 106 may entropy encode the one-dimensional vector. For example,the encoder engine 106 may use context adaptive variable length coding,context adaptive binary arithmetic coding, syntax-based context-adaptivebinary arithmetic coding, probability interval partitioning entropycoding, or another suitable entropy encoding technique.

As previously described, an HEVC bitstream includes a group of NALunits. A sequence of bits forming the coded video bitstream is presentin VCL NAL units. Non-VCL NAL units may contain parameter sets withhigh-level information relating to the encoded video bitstream, inaddition to other information. For example, a parameter set may includea video parameter set (VPS), a sequence parameter set (SPS), and apicture parameter set (PPS). The goal of the parameter sets is bit rateefficiency, error resiliency, and providing systems layer interfaces.Each slice references a single active PPS, SPS, and VPS to accessinformation that the decoding device 112 may use for decoding the slice.An identifier (ID) may be coded for each parameter set, including a VPSID, an SPS ID, and a PPS ID. An SPS includes an SPS ID and a VPS ID. APPS includes a PPS ID and an SPS ID. Each slice header includes a PPSID. Using the IDs, active parameter sets can be identified for a givenslice.

A PPS includes information that applies to all slices in a givenpicture. Because of this, all slices in a picture refer to the same PPS.Slices in different pictures may also refer to the same PPS. An SPSincludes information that applies to all pictures in a same coded videosequence or bitstream. A coded video sequence is a series of accessunits that starts with a random access point picture (e.g., aninstantaneous decoding refresh (IDR) picture or broken link access (BLA)picture, or other appropriate random access point picture) and includesall access units up to but not including the next random access pointpicture (or the end of the bitstream). The information in an SPS doesnot typically change from picture to picture within a coded videosequence. All pictures in a coded video sequence use the same SPS. TheVPS includes information that applies to all layers within a coded videosequence or bitstream. The VPS includes a syntax structure with syntaxelements that apply to entire coded video sequences. In someembodiments, the VPS, SPS, or PPS may be transmitted in-band with theencoded bitstream. In some embodiments, the VPS, SPS, or PPS may betransmitted out-of-band in a separate transmission than the NAL unitscontaining coded video data.

The output 110 of the encoding device 104 may send the NAL units makingup the encoded video data over the communications link 120 to thedecoding device 112 of the receiving device. The input 114 of thedecoding device 112 may receive the NAL units. The communications link120 may include a signal transmitted using a wireless network, a wirednetwork, or a combination of a wired and wireless network. A wirelessnetwork may include any wireless interface or combination of wirelessinterfaces and may include any suitable wireless network (e.g., theInternet or other wide area network, a packet-based network, WiFi™,radio frequency (RF), UWB, WiFi-Direct, cellular, Long-Term Evolution(LTE), WiMax™, or the like). A wired network may include any wiredinterface (e.g., fiber, ethernet, powerline ethernet, ethernet overcoaxial cable, digital signal line (DSL), or the like). The wired and/orwireless networks may be implemented using various equipment, such asbase stations, routers, access points, bridges, gateways, switches, orthe like. The encoded video data may be modulated according to acommunication standard, such as a wireless communication protocol, andtransmitted to the receiving device.

In some examples, the encoding device 104 may store encoded video datain storage 108. The output 110 may retrieve the encoded video data fromthe encoder engine 106 or from the storage 108. Storage 108 may includeany of a variety of distributed or locally accessed data storage media.For example, the storage 108 may include a hard drive, a storage disc,flash memory, volatile or non-volatile memory, or any other suitabledigital storage media for storing encoded video data.

The input 114 receives the encoded video data and may provide the videodata to the decoder engine 116 or to storage 118 for later use by thedecoder engine 116. The decoder engine 116 may decode the encoded videodata by entropy decoding (e.g., using an entropy decoder) and extractingthe elements of the coded video sequence making up the encoded videodata. The decoder engine 116 may then rescale and perform an inversetransform on the encoded video data. Residues are then passed to aprediction stage of the decoder engine 116. The decoder engine 116 thenpredicts a block of pixels (e.g., a PU). In some examples, theprediction is added to the output of the inverse transform.

The decoding device 112 may output the decoded video to a videodestination device 122, which may include a display or other outputdevice for displaying the decoded video data to a consumer of thecontent. In some aspects, the video destination device 122 may be partof the receiving device that includes the decoding device 112. In someaspects, the video destination device 122 may be part of a separatedevice other than the receiving device.

In some embodiments, the video encoding device 104 and/or the videodecoding device 112 may be integrated with an audio encoding device andaudio decoding device, respectively. The video encoding device 104and/or the video decoding device 112 may also include other hardware orsoftware that is necessary to implement the coding techniques describedabove, such as one or more microprocessors, digital signal processors(DSPs), application specific integrated circuits (ASICs), fieldprogrammable gate arrays (FPGAs), discrete logic, software, hardware,firmware or any combinations thereof. The video encoding device 104 andthe video decoding device 112 may be integrated as part of a combinedencoder/decoder (codec) in a respective device. An example of specificdetails of the encoding device 104 is described below with reference toFIG. 16. An example of specific details of the decoding device 112 isdescribed below with reference to FIG. 17.

As noted above, extensions to the HEVC standard include the MultiviewVideo Coding extension, referred to as MV-HEVC, and the Scalable VideoCoding extension, referred to as SHVC. The MV-HEVC and SHVC extensionsshare the concept of layered coding, with different layers beingincluded in the encoded video bitstream. Each layer in a coded videosequence is addressed by a unique layer identifier (ID). A layer ID maybe present in a header of a NAL unit to identify a layer with which theNAL unit is associated. In MV-HEVC, different layers usually representdifferent views of the same scene in the video bitstream. In SHVC,different scalable layers are provided that represent the videobitstream in different spatial resolutions (or picture resolution) or indifferent reconstruction fidelities. The scalable layers may include abase layer (with layer ID=0) and one or more enhancement layers (withlayer IDs =1, 2, . . . n). The base layer may conform to a profile ofthe first version of HEVC, and represents the lowest available layer ina bitstream. The enhancement layers have increased spatial resolution,temporal resolution or frame rate, and/or reconstruction fidelity (orquality) as compared to the base layer. The enhancement layers arehierarchically organized and may (or may not) depend on lower layers. Insome examples, the different layers may be coded using a single standardcodec (e.g., all layers are encoded using HEVC, SHVC, or other codingstandard). In some examples, different layers may be coded using amulti-standard codec. For example, a base layer may be coded using AVC,while one or more enhancement layers may be coded using SHVC and/orMV-HEVC extensions to the HEVC standard.

In general, a layer includes a set of VCL NAL units and a correspondingset of non-VCL NAL units. Each of the NAL units is assigned a particularlayer ID value. Layers can be hierarchical in the sense that a layer maydepend on a lower layer. A layer set refers to a set of layersrepresented within a bitstream that are self-contained, meaning that thelayers within a layer set can depend on other layers in the layer set inthe decoding process, but do not depend on any other layers fordecoding. Accordingly, the layers in a layer set can form an independentbitstream that can represent video content. The set of layers in a layerset may be obtained from another bitstream by operation of asub-bitstream extraction process. A layer set may correspond to the setof layers that is to be decoded when a decoder wants to operateaccording to certain parameters.

When encoding a video sequence, it is beneficial to have control overthe decoder buffer state for many applications. This applies forcommunications and/or broadcasting. The encoder should provide thetransmitted data such that it is available at the decoder at a decodingtime of the corresponding picture. Further, the encoder should providethat the bitstream does not overrun the input bitstream buffer of thedecoder as well as the picture buffer in which the decoded pictures arestored.

A hypothetical reference decoder (HRD) is provided to test control overan encoded video sequence. The HRD may be generally operable with videosequences encoded according to a video compression standard. Theparameters for configuration and operation of the hypothetical referencedecoder can be provided in a video parameter set (VPS) and/or in asequence parameter set (SPS). The HRD parameters can be provided formultiple operation points for the bitstream, as detailed below. Thisprovides information on the characteristics of the bitstream afterfurther processing (e.g. sub-bitstream extraction). The HRD can beapplied in encoders to control the produced bitstream and can also beapplied to verify the conformance of a given bitstream to standardsspecification requirements. Further, conformance of the subject decoderimplementation may be tested against the performance and timingrequirements defined by the HRD. An encoder may selectively omit some orall signaling of HRD parameters for a bitstream, or for some or alllayers of a bitstream. This may provide some constraints related toverification of bitstream conformance to a video compression standard.

Sets of HRD parameters are provided (e.g., in a sequence or videoparameter set, or in other messaging) to allow for multi-layerfunctionality, with each set of parameters corresponding to an operationpoint. An operation point defines the parameters used for sub-bitstreamextraction, and includes a list of target layers (a layer set for thatoperation point) and a target highest temporal layer. Multiple operationpoints may be applicable to a given bitstream. An operation point mayeither include all the layers in a layer set or may be a bitstreamformed as a subset of the layer set. For example, an operation point ofa bitstream may be associated with a set of layer identifiers and atemporal identifier. A layer identifier list may be used to identify thelayers to be included in the operation point. The layer identifier listmay be included in a parameter set (e.g., a VPS). The layer identifierlist may include a list of layer identifier (ID) values (e.g., indicatedby a syntax element nuh_layer_id). In some cases, the layer ID valuesmay include non-negative integers, and each layer may be associated witha unique layer ID value so that each layer ID value identifies aparticular layer. A highest temporal ID (e.g., identified by a variableTemporalId) may be used to define a temporal subset. In someembodiments, a layer identifier list and a target highest temporal IDmay be used as inputs to extract an operation point from a bitstream.For example, when a NAL unit has a layer identifier that is included ina set of layer identifiers associated with an operation point, and thetemporal identifier of the NAL unit is less than or equal to thetemporal identifier of the operation point, the NAL unit is associatedwith the operation point. A target output layer is a layer that is to beoutput, and an output layer set is a layer set that is associated with aset of target output layers. For example, an output layer set is a setof layers including the layers of a specified layer set, where one ormore layers in the set of layers are indicated to be output layers. Anoutput operation point corresponds to a particular output layer set. Forexample, an output operation point may include a bitstream that iscreated from an input bitstream by operation of a sub-bitstreamextraction process with the input bitstream, a target highest temporalidentifier (TemporalId), and a target layer identifier list as inputs,and that is associated with a set of output layers.

As previously described, parameter sets are provided with an encodedvideo bitstream (e.g., in one or more non-VCL NAL units). The parametersets contain high-level syntax information defining various parametersof the encoded video bitstream. One example of a parameter set includesa video parameter set (VPS). The VPS may have two parts, including abase part (or base VPS) and an extension part (or VPS extension). Thebase VPS is defined in the first edition of the HEVC standard, and theVPS extension is defined in a later edition of the HEVC standard. Thebase VPS may contain information related to the HEVC base layer (orcompatible layer). The base VPS may also contain temporal scalabilityinformation, including a maximum number of temporal layers. One or morelayer sets may be defined in the base VPS. For example, the base VPS maydefine a layer set 0 that corresponds to a layer set including the baselayer. The VPS extension may contain information related to one or moreadditional layers beyond the base layer. For example, one or moreadditional layer sets may be defined in the VPS extension, which are notdefined in the base part.

FIG. 2 illustrates an example of layer sets defined in a base part (baseVPS 202) and an extension part (VPS extension 204) of a video parameterset. The base VPS 202 defines layer set 0, layer set 1, layer set 2, andlayer set 3. The layer set 0 includes layer 0. The layer set 1 includeslayer 0 and layer 1. The layer set 2 includes layer 0, layer 1, andlayer 2. The layer set 3 includes layer 0, layer 1, layer 2, and layer3. The VPS extension 204 defines additional layer sets that are notdefined in the base VPS 202. The additional layer sets include layer set4 and layer set 5. The additional layer set 4 includes layer 4, and theadditional layer set 5 includes layer 5 and layer 6. In some examples,layer 0 may be a base layer, and layers 1, 2, 3, 4, 5, and 6 may beenhancement layers. For example, the layer 0 may be a base layer with alayer identifier (ID) equal to 0. The base layer may also be referred toas a compatible layer. The base layer conforms to a profile of the firstversion of HEVC, and represents the lowest available layer in abitstream. The layers 1, 2, 3, 4, 5, and 6 may include enhancementlayers having corresponding layer IDs. For example, layer 1 has a layerID equal to 1,layer 2 has a layer ID equal to 2, layer 3 has a layer IDequal to 3, layer 4 has a layer ID equal to 4, layer 5 has a layer IDequal to 5, and layer 6 has a layer ID equal to 6. Enhancement layershave increased spatial resolution, temporal resolution or frame rate,and/or reconstruction fidelity (or quality) as compared to the baselayer. In some examples, layer 0 may have a frame rate of 7.5 Hz and abit rate of 64 kilobytes per second, layer 1 may have a frame rate of 15Hz and a bit rate of 128 kilobytes per second, layer 2 may have a framerate of 15 Hz and a bit rate of 256 kilobytes per second, layer 3 mayhave a frame rate of 30 Hz and a bit rate of 512 kilobytes per second,layer 4 may have a frame rate of 30 Hz and a bit rate of 1 megabyte persecond, layer 5 may have a frame rate of 60 Hz and a bit rate of 1.5megabytes per second, and layer 6 may have a frame rate of 60 Hz and abit rate of 2 megabytes per second. In some examples, frame rates mayalso referred to as picture rates, and thus the different layers 0, 1,2, 3, 4, 5, and 6 may also have different picture rates. One of ordinaryskill in the art will appreciate that these numbers are provided as anexample only, and that the layers may have other frame rates and bitrates according to the particular implementation.

Signaling information is provided in the VPS that definescharacteristics of one or more layer sets defined in the base VPS 202.In some examples, the signaling information may define rate informationfor the one or more layer sets. Rate information includes, for example,bit rate information, picture rate information, or other suitable rateinformation that applies to the layers in a given layer set. In oneexample, bit rate information for a given layer set may include anaverage bit rate or an average picture rate of the layers of the givenlayer set. In another example, the bit rate information may include amaximum bit rate of the layers of a given layer set. Other examples ofrate information are provided below. In some examples, the signalinginformation may include target output information indicating whether alayer in a layer set is a target output layer of an output layer set.For example, the target output information may include anoutput_layer_flag[i][j] syntax element. As used herein, the variables[i] and [j] refer to the j-th layer of the i-th layer set. The rateinformation and the target output information should be signaled for alllayer sets (defined in the base and extension parts of the VPS),including the layer sets and the additional layer sets, as clients maychoose to request or consume an additional layer set based on suchinformation. However, with the current signaling scheme defined in theHEVC standard, signaling information is only signaled for layer setsthat are defined in the base part of the VPS.

The number of layer sets that are signaled in the base VPS (e.g., baseVPS 202) is indicated by a syntax element of the VPS. For example, FIG.3 illustrates an example of a syntax structure 300 of a VPS extension.The entry 302 includes syntax element 306, labeledvps_num_layer_sets_minus1, that indicates the number of layer sets thatare signaled in the base VPS. The syntax element 304, labeled outputlayer flag[i][j], indicates whether a layer in a layer set is a targetoutput layer of an output layer set. Because thevps_num_layer_sets_minus1 syntax element 306 indicates the number oflayer sets signaled in the base VPS (and not the additional layer setssignaled in the VPS extension), the output layer flag[i][j] syntaxelement 304 is only signaled for those layer sets defined in the baseVPS.

The total number of layer sets that are signaled in the base VPS and VPSextension (including the additional layer sets signaled in the VPSExtension, if present) is indicated by a variable NumLayerSets that isderived based on syntax elements of the VPS. Embodiments describedherein include updating the signaling of information in the VPS relatedto layer sets so that the signaling information (e.g., rate informationand target output information) is signaled for all layer sets, includingthe additional layer sets defined in the VPS extension 204. For example,as illustrated in FIG. 4, the vps_num_layer_sets_minus1 syntax element306 may be removed from the VPS extension and a NumLayerSets variable406 may be added to the entry 302 to create a new syntax structure 400.Because the NumLayerSets variable 406 indicates the total number oflayer sets that are signaled in the base VPS and the VPS extension, theoutput_layer_flag[i][j] syntax element 304 is signaled for the layersets defined in the base VPS and the additional layer sets defined inthe VPS extension.

FIG. 5 illustrates another example of a syntax structure 500 of a VPS.The syntax structure 500 is part of a video usability information (VUI)portion of the VPS extension, which may be referred to herein as the VPSVUI. The VPS VUI syntax structure contains information that is usefulfor preparing the decoded video for output and display. The VPS VUI mayinclude information related to the encoded video, such as rateinformation, sample aspect ratio, the original color space andrepresentation of the encoded video, picture timing information, orother information. The inclusion of different parts in the VUI syntaxstructure is optional and can be decided as required by a particularimplementation or application. In some examples, default values may bespecified for some or all VUI parameters for cases in which thecorresponding VUI parameters have not been provided.

In the example of FIG. 5, the syntax structure 500 of the VPS VUIincludes a bit_rate_present_flag[i][] syntax element 504 that includes aflag indicating whether bit rate information is available for one ormore layer sets signaled in the VPS. For example, a value of 0 or 1 forthe flag may indicate that bit rate information is available for the oneor more layer sets. The syntax structure 500 of the VPS VUI furtherincludes pic_rate_present_flag[i][j] syntax element 506 that includes aflag indicating whether picture rate information is available for one ormore layer sets signaled in the VPS. For example, a value of 0 or 1 forthe flag may indicate that picture rate information is available for theone or more layer sets. The syntax structure 500 of the VPS VUI alsoincludes an avg_bit_rate[i][j] syntax element 508 that indicates anaverage bit rate for each layer set of the one or more layer setssignaled in the VPS. The syntax structure 500 of the VPS VUI furtherincludes a max_bit_rate syntax element 510 that indicates a maximum bitrate for each layer set of the one or more layer sets signaled in theVPS. The syntax structure 500 of the VPS VIM also includes aconstant_pic_rate_idc[i][j] syntax element 512 that indicates whether alayer set of the one or more layer sets signaled in the VPS has aconstant picture rate. The syntax structure 500 of the VPS VUI furtherincludes a avg_pic_rate[i][j] syntax element 514 that indicates anaverage picture rate for each layer set of the one or more layer setssignaled in the VPS. One of ordinary skill in the art will appreciatethat syntax elements 504-514 are examples, and that more or fewer setsof signaling information may be present in the syntax structure 500 ofthe VPS VUI.

The information provided in syntax elements 504-514 is signaled forthose layer sets that are defined in the VPS extension, which isprovided in entry 502 of the syntax structure 500. The entry 502includes a syntax element that indicates the number of layer sets thatare signaled. The entry 502 shown in FIG. 5 includes thevps_num_layer_sets_minus1 syntax element 516, which indicates the numberof layer sets signaled in the base VPS (and not the additional layersets signaled in the VPS extension). Accordingly, the rate informationsyntax elements 504-514 are only signaled for those layer sets definedin the base VPS. FIG. 6 illustrates an example of a syntax structure 600of the VPS VUI with updated signaling information that relates to alllayer sets, including the additional layer sets defined in the VPSextension. In the example of FIG. 6, the vps_num_layer_sets_minus1syntaxelement 516 is removed from the VPS VUI and a NumLayerSets variable 616is added to the entry 502 to create the new syntax structure 600.Because the NumLayerSets variable 616 indicates the total number oflayer sets that are signaled in the base VPS and the VPS extension, therate information signaled in the syntax elements 504-514 is signaled forthe layer sets defined in the base VPS and the additional layer setsdefined in the VPS extension.

FIG. 7 illustrates an embodiment of a process 700 of encoding videodata. The process 700 is implemented to signal information for layersets (including additional layer sets) defined in a parameter set, suchas a video parameter set. In some aspects, the process 700 may beperformed by a computing device or an apparatus, such as the encodingdevice 104 shown in FIG. 1 or FIG. 16. For example, the computing deviceor apparatus may include an encoder, or a processor, microprocessor,microcomputer, or other component of an encoder that is configured tocarry out the steps of process 700.

Process 700 is illustrated as a logical flow diagram, the operation ofwhich represents a sequence of operations that can be implemented inhardware, computer instructions, or a combination thereof. In thecontext of computer instructions, the operations representcomputer-executable instructions stored on one or more computer-readablestorage media that, when executed by one or more processors, perform therecited operations. Generally, computer-executable instructions includeroutines, programs, objects, components, data structures, and the likethat perform particular functions or implement particular data types.The order in which the operations are described is not intended to beconstrued as a limitation, and any number of the described operationscan be combined in any order and/or in parallel to implement theprocesses.

Additionally, the process 700 may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs, or one or more applications) executing collectivelyon one or more processors, by hardware, or combinations thereof. Asnoted above, the code may be stored on a computer-readable ormachine-readable storage medium, for example, in the form of a computerprogram comprising a plurality of instructions executable by one or moreprocessors. The computer-readable or machine-readable storage medium maybe non-transitory.

At 702, the process 700 of encoding video data includes generating anencoded video bitstream comprising one or more layer sets and one ormore additional layer sets. Each of a layer set and an additional layerset includes one or more layers, as previously described. The encodedvideo bitstream includes a video parameter set defining parameters ofthe encoded video bitstream. The one or more layer sets are defined in abase part of the video parameter set, and the one or more additionallayer sets are defined in an extension part of the video parameter set.The encoded video bitstream may be encoded using an HEVC codingtechnique, or other suitable coding technique. In one example, the oneor more layer sets defined in the base part of the video parameter set(VPS) include the layer set 0, the layer set 1, the layer set 2, and thelayer set 3 defined in the base VPS 202 shown in FIG. 2, and the one ormore additional layer sets include the layer set 4 and the layer set 5defined in the VPS extension 204 shown in FIG. 2. One of ordinary skillin the art will appreciate that the one or more layer sets and/or theone or more additional layer sets may include other layer sets thanthose shown in the examples of FIG. 2.

At 704, the process 700 includes providing, in the video parameter set,one or more syntax elements for signaling information related to the oneor more layer sets and the one or more additional layer sets. Theinformation includes rate information for the one or more layer setsdefined in the base part of the video parameter set and for the one ormore additional layer sets defined in the extension part of the videoparameter set. Accordingly, the rate information is signaled for boththe layer sets defined in the base VPS and for the additional layer setsdefined in the VPS extension. For example, the rate information may besignaled for the one or more layer sets defined in the base part of thevideo parameter set and for the one or more additional layer sets byinserting the NumLayerSets variable 616 in the entry 502 of the VPS VUI.In some embodiments, different rate information is signaled for eachdifferent layer set of the one or more layer sets and the one or moreadditional layer sets. For example, a first set of rate information maybe signaled for the layer set 0 defined in the base VPS 202, and asecond set of rate information may be signaled for the layer set 4defined in the VPS extension 204.

In some embodiments, the rate information includes bit rate information.In some embodiments, the rate information includes picture rateinformation. In some examples, the rate information may be included inany of the syntax elements 504-514 shown in FIG. 5 and FIG. 6. Forexample, the one or more syntax elements in the video parameter setinclude a flag that indicates whether bit rate information is availablefor an additional layer set. The flag may be set to a value of 0 or 1 toindicate that bit rate information is available for the additional layerset. The one or more syntax elements may also include a flag indicatingwhether bit rate information is available for a layer set defined in thebase part of the VPS. An example of such a flag is thebit_rate_present_flag[i][j] syntax element 504 shown in FIG. 5 and FIG.6.

In another example, the one or more syntax elements in the videoparameter set include a flag that indicates whether picture rateinformation is available for an additional layer set. The flag may beset to a value of 0 or 1 to indicate that picture rate information isavailable for the additional layer set. The one or more syntax elementsmay also include a flag indicating whether picture rate information isavailable for a layer set defined in the base part of the VPS. Anexample of such a flag is the pic_rate_present_flag[i][j] syntax element506 shown in FIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates an average bitrate for an additional layer set. The one or more syntax elements mayalso include a similar syntax element indicating an average bit rate fora layer set defined in the base part of the VPS. An example of such asyntax element is the avg_bit_rate[i][j] syntax element 508 shown inFIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates a maximum bit ratefor an additional layer set. The one or more syntax elements may alsoinclude a similar syntax element indicating a maximum bit rate for alayer set defined in the base part of the VPS. An example of such asyntax element is the max_bit_rate[i][j] syntax element 510 shown inFIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates whether anadditional layer set has a constant picture rate. The one or more syntaxelements may also include a similar syntax element indicating whether alayer set defined in the base part of the VPS has a constant picturerate. An example of such a syntax element is theconstant_pic_rate_idc[i][j] syntax element 512 shown in FIG. 5 and FIG.6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates an average picturerate for an additional layer set. The one or more syntax elements mayalso include a similar syntax element indicating an average picture ratefor a layer set defined in the base part of the VPS. An example of sucha syntax element is the avg_pic_rate[i][j] syntax element 514 shown inFIG. 5 and FIG. 6.

In some embodiments, the one or more syntax elements may signal targetoutput information for both the layer sets defined in the base VPS andfor the additional layer sets defined in the VPS extension. For example,the one or more syntax elements in the video parameter set include aflag that indicates whether a layer in an additional layer set is atarget output layer of an output layer set. The flag may be set to avalue of 0 or 1 to indicate that the layer in the additional layer setis a target output layer of an output layer set. The one or more syntaxelements may also include a similar flag indicating whether a layer in alayer set defined in the base VPS is a target output layer of an outputlayer set. An example of such a flag is the output_layer_flag[i][j]syntax element 304 shown in FIG. 3 and FIG. 4.

FIG. 8 illustrates an embodiment of a process 800 of decoding videodata. The process 800 is implemented to receive and decode signalinginformation for layer sets (including additional layer sets) defined ina parameter set, such as a video parameter set. In some aspects, theprocess 800 may be performed by a computing device or an apparatus, suchas the decoding device 112 shown in FIG. 1 or in FIG. 17. For example,the computing device or apparatus may include a decoder, or a processor,microprocessor, microcomputer, or other component of a decoder that isconfigured to carry out the steps of process 800.

Process 800 is illustrated as a logical flow diagram, the operation ofwhich represents a sequence of operations that can be implemented inhardware, computer instructions, or a combination thereof. In thecontext of computer instructions, the operations representcomputer-executable instructions stored on one or more computer-readablestorage media that, when executed by one or more processors, perform therecited operations. Generally, computer-executable instructions includeroutines, programs, objects, components, data structures, and the likethat perform particular functions or implement particular data types.The order in which the operations are described is not intended to beconstrued as a limitation, and any number of the described operationscan be combined in any order and/or in parallel to implement theprocesses.

Additionally, the process 800 may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs, or one or more applications) executing collectivelyon one or more processors, by hardware, or combinations thereof. Asnoted above, the code may be stored on a computer-readable ormachine-readable storage medium, for example, in the form of a computerprogram comprising a plurality of instructions executable by one or moreprocessors. The computer-readable or machine-readable storage medium maybe non-transitory.

At 802, the process 800 of decoding video data includes obtaining anencoded video bitstream comprising one or more layer sets and one ormore additional layer sets. Each of a layer set and an additional layerset includes one or more layers. The encoded video bitstream includes avideo parameter set defining parameters of the encoded video bitstream.The one or more layer sets are defined in a base part of the videoparameter set, and the one or more additional layer sets are defined inan extension part of the video parameter set. The encoded videobitstream may be encoded using an HEVC coding technique, or othersuitable coding technique. In one example, the one or more layer setsdefined in the base part of the video parameter set (VPS) include thelayer set 0, the layer set 1, the layer set 2, and the layer set 3defined in the base VPS 202 shown in FIG. 2, and the one or moreadditional layer sets include the layer set 4 and the layer set 5defined in the VPS extension 204 shown in FIG. 2. One of ordinary skillin the art will appreciate that the one or more layer sets and/or theone or more additional layer sets may include other layer sets thanthose shown in the examples of FIG. 2.

At 804, the process 800 includes decoding one or more syntax elementsfrom the video parameter set. The one or more syntax elements includerate information for the one or more layer sets defined in the base partof the video parameter set and for the one or more additional layer setsdefined in the extension part of the video parameter set. In someembodiments, the the one or more syntax elements include different rateinformation for each different layer set of the one or more layer setsand the one or more additional layer sets. For example, a first set ofrate information may be signaled for the layer set 0 defined in the baseVPS 202, and a second set of rate information may be signaled for thelayer set 1 defined in the VPS extension 204.

In some embodiments, the rate information includes bit rate information.In some embodiments, the rate information includes picture rateinformation. In some examples, the rate information may be included inany of the syntax elements 504-514 shown in FIG. 5 and FIG. 6. Forexample, the one or more syntax elements in the video parameter setinclude a flag that indicates whether bit rate information is availablefor an additional layer set. The flag may be set to a value of 0 or 1 toindicate that bit rate information is available for the additional layerset. The one or more syntax elements may also include a flag indicatingwhether bit rate information is available for a layer set defined in thebase part of the VPS. An example of such a flag is thebit_rate_present_flag[i][j] syntax element 504 shown in FIG. 5 and FIG.6.

In another example, the one or more syntax elements in the videoparameter set include a flag that indicates whether picture rateinformation is available for an additional layer set. The flag may beset to a value of 0 or 1 to indicate that picture rate information isavailable for the additional layer set. The one or more syntax elementsmay also include a flag indicating whether picture rate information isavailable for a layer set defined in the base part of the VPS. Anexample of such a flag is the pic_rate_present_flag[i][j] syntax element506 shown in FIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates an average bitrate for an additional layer set. The one or more syntax elements mayalso include a similar syntax element indicating an average bit rate fora layer set defined in the base part of the VPS. An example of such asyntax element is the avg_bit_rate[i][j] syntax element 508 shown inFIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates a maximum bit ratefor an additional layer set. The one or more syntax elements may alsoinclude a similar syntax element indicating a maximum bit rate for alayer set defined in the base part of the VPS. An example of such asyntax element is the max_bit_rate[i][j] syntax element 510 shown inFIG. 5 and FIG. 6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates whether anadditional layer set has a constant picture rate. The one or more syntaxelements may also include a similar syntax element indicating whether alayer set defined in the base part of the VPS has a constant picturerate. An example of such a syntax element is theconstant_pic_rate_idc[i][j] syntax element 512 shown in FIG. 5 and FIG.6.

In another example, the one or more syntax elements in the videoparameter set include a syntax element that indicates an average picturerate for an additional layer set. The one or more syntax elements mayalso include a similar syntax element indicating an average picture ratefor a layer set defined in the base part of the VPS. An example of sucha syntax element is the avg_pic_rate[i][j] syntax element 512 shown inFIG. 5 and FIG. 6.

In some embodiments, the one or more syntax elements may signal targetoutput information for both the layer sets defined in the base VPS andfor the additional layer sets defined in the VPS extension. For example,the one or more syntax elements in the video parameter set include aflag that indicates whether a layer in an additional layer set is atarget output layer of an output layer set. The flag may be set to avalue of 0 or 1 to indicate that the layer in the additional layer setis a target output layer of an output layer set. The one or more syntaxelements may also include a similar flag indicating whether a layer in alayer set defined in the base VPS is a target output layer of an outputlayer set. An example of such a flag is the output_layer_flag[i][j]syntax element 304 shown in FIG. 3 and FIG. 4.

Using the above-described techniques of signaling information for layersets (including additional layer sets) defined in a parameter set, rateinformation and target output information is signaled for the layer setsdefined in the base VPS and also for the additional layer sets definedin the VPS extension.

In further embodiments, techniques and systems are described forsignaling hypothetical reference decoder parameters in a parameter setin only certain conditions. Hypothetical reference decoder parametersare provided in a parameter set to allow for multi-layer functionality.Different sets of hypothetical reference decoder parameters correspondto different operation points. The hypothetical reference decoderparameters can be used in various ways. For example, a bitstreamconformance check may include performing a normative test usinghypothetical reference decoder parameters. The normative test uses thehypothetical reference decoder parameters to check that a bitstream orsub-bitstream can be decoded by a hypothetical reference decoder that isconceptually connected to the output of an encoder and that includes acoded picture buffer, a decoder, and decoded picture buffer. The encodermust make sure various constraints are met when creating a bitstream tomeet conformance, including making sure that the tools used in thebitstream match those signaled in the parameter sets, making sure thatthe coded picture buffer of the hypothetical reference decoder does notoverflow or underflow, making sure pictures marked as used for referenceare not used as reference afterwards, or other requirements. A bufferoverflow occurs when too many coded data units are present for thedecoder buffer. Underflow occurs when it is the time for the decoder toprocess some coded data units but the buffer is empty.

Hypothetical reference decoder parameters may be signaled in the VPS andin the VPS extension (e.g., in the VPS VUI) for different operationpoints and associated layer sets. The signaling of the hypotheticalreference decoder parameters in the VPS VUI may be controlled by agating flag. The value of this flag can be set equal to 1 or 0independently by encoders. In one example, hypothetical referencedecoder parameters may not be signaled in the VPS VUI when a value ofthe gating flag is set to 0. In another example, hypothetical referencedecoder parameters may be signaled in the VPS VUI when a value of thegating flag is set to 1. One of ordinary skill in the art willappreciate that hypothetical reference decoder parameters may not besignaled when the value is set to 1, and that hypothetical referencedecoder parameters may be signaled when the value is set to 0.

Embodiments are described herein for signaling hypothetical referencedecoder parameters in the VPS VUI when certain information is signaledin the VPS and/or the VPS VUI. For example, hypothetical referencedecoder parameters depend on timing information provided in the VPS VUI,in the base part of the VPS, or in both the VPS VUI and the base VPS.Timing information is provided to allow a correct play-out speed of adecoded video sequence. The syntax structure for hypothetical referencedecoder parameters is placed in the timing information section of theVPS VUI. In some cases, the timing information defines parameters neededto install a timing scheme for a decoding process, such as a clock rateand the length of a clock tick. The timing information may furtherinclude a flag indicating that a picture order count (defining arelation of the pictures in terms of ordering and distance if used forprediction) is proportional to the output time of the picture relativeto the beginning of the coded video sequence (e.g. an intra randomaccess picture (TRAP), such as an instantaneous decoding refresh (IDR)picture where the picture order count is reset) Using the indicationprovided by the flag, the picture output timing can be directly derivedfrom the picture order count.

Signaling of hypothetical reference decoder information when timinginformation is not present in the VPS is an inefficient use of bits,leading to wasted processing and use of network resources. Accordingly,hypothetical reference decoder parameters may be signaled in the VPS VUIwhen timing information is also signaled in the VPS or the VPS VUI.Similarly, hypothetical reference decoder parameters may not be signaledin the VPS VUI when no timing information is signaled in the VPS or theVPS VUI. In some aspects, an encoder (or other device, such as aneditor, splicer, or the like) may condition the gating flag to bedependent on a value of a syntax element that indicates whether timinginformation is present in the VPS or the VPS VUI.

In one example, the gating flag may be signaled or may not be signaleddepending on the presence of the timing information. FIG. 9A illustratesan example of a syntax structure 900 of a VPS VUI with a timinginformation syntax element 902, labeled vps_timing_info_present_flag.The timing information syntax element 902 indicates whether timinginformation is included in the VPS or the VPS VUI. The syntax structure900 further includes gating flag syntax element 904, labeledvps_vui_bsp_hrd_present_flag. The presence of the gating flag syntaxelement 904 is dependent on the value of the timing information syntaxelement 902. When the timing information syntax element 902 is set to avalue of 0 (indicating that no timing information is present), thegating flag syntax element 904 may not be signaled in the VPS VUI (inwhich case the syntax structure 900 does not include the gating flagsyntax element 904 when the VPS VUI is sent to the decoder). In such anexample, the value of the gating flag syntax element 904 is determinedby the encoder to be a value of 0, indicating that no hypotheticalreference decoder parameters are to be signaled in the VPS VUI.Accordingly, the encoder (or other device, such as an editor, splicer,or the like) may determine not to signal hypothetical reference decoderparameters in the VPS VUI. This example is illustrated in FIG. 9A by theinclusion of the condition 906 with the timing information syntaxelement 902 in the syntax structure. For example, when the timinginformation syntax element 902 is set to a value of 0 (indicating thatno timing information is present), the encoder (or other device, such asan editor, splicer, or the like) may determine not to signalhypothetical reference decoder parameters in the VPS VUI. The encoder(or other device) may then remove the gating flag syntax element 904from the syntax structure 900. When the VPS VUI is received by thedecoder (or other device receiving the VPS VUI), the decoder infers thevalue of the gating flag to be a value of 0 based on the absence of thegating flag syntax element 904. The decoder then determines that nohypothetical reference decoder parameters are signaled in the VPS VUIbased on the inferred value of 0 for the gating flag.

In another example, a value of the gating flag may be dependent on thepresence of the timing information. For example, a constraint may beadded to express that when the timing information syntax element 902 isequal to 0, the value of the gating flag syntax element 904 shall alsobe equal to 0. This example is illustrated in FIG. 9B by the absence ofthe condition 906 from the syntax structure 900. In this example, atiming information syntax element indicating whether timing informationis present in the VPS or the VPS VUI is signaled earlier in the VPS orthe VPS VUI (not shown in FIG. 9). When the timing information syntaxelement (not shown in FIG. 9) is set to a value of 0 (indicating that notiming information is present), the encoder (or other device, such as aneditor, splicer, or the like) may be forced to set the gating flagsyntax element 904 to a value of 0, indicating that no hypotheticalreference decoder parameters are signaled in the VPS VUI. The encoder(or other device, such as an editor, splicer, or the like) may determinenot to signal hypothetical reference decoder parameters in the VPS VUIas a result. When the VPS VUI is received by the decoder (or otherdevice receiving the VPS VUI), the decoder determines that the value ofthe gating flag syntax element 904 is set to a value of 0 to learn thatno hypothetical reference decoder parameters are signaled in the VPSVUI.

FIG. 10 illustrates an embodiment of a process 1000 of encoding videodata. The process 1000 is implemented to signal hypothetical referencedecoder parameters in a parameter set in only certain situations. Insome aspects, the process 1000 may be performed by a computing device oran apparatus, such as the encoding device 104 shown in FIG. 1 or in FIG.16. For example, the computing device or apparatus may include anencoder, or a processor, microprocessor, microcomputer, or othercomponent of an encoder that is configured to carry out the steps ofprocess 1000.

Process 1000 is illustrated as a logical flow diagram, the operation ofwhich represents a sequence of operations that can be implemented inhardware, computer instructions, or a combination thereof. In thecontext of computer instructions, the operations representcomputer-executable instructions stored on one or more computer-readablestorage media that, when executed by one or more processors, perform therecited operations. Generally, computer-executable instructions includeroutines, programs, objects, components, data structures, and the likethat perform particular functions or implement particular data types.The order in which the operations are described is not intended to beconstrued as a limitation, and any number of the described operationscan be combined in any order and/or in parallel to implement theprocesses.

Additionally, the process 1000 may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs, or one or more applications) executing collectivelyon one or more processors, by hardware, or combinations thereof. Asnoted above, the code may be stored on a computer-readable ormachine-readable storage medium, for example, in the form of a computerprogram comprising a plurality of instructions executable by one or moreprocessors. The computer-readable or machine-readable storage medium maybe non-transitory.

At 1002, the process 1000 of encoding video data includes generating anencoded video bitstream comprising multiple layers. The encoded videobitstream includes a video parameter set defining parameters of theencoded video bitstream. The video parameter set includes videousability information, which may be referred to as a VPS VUI. Theencoded video bitstream may be encoded using an HEVC coding technique,or other suitable coding technique.

At 1004, the process 1000 includes determining whether timinginformation is signaled in the video usability information of the videoparameter set. In some embodiments, determining whether the timinginformation is signaled in the video usability information of the videoparameter set includes determining a value of a first flag in the videousability information. The first flag indicates whether the timinginformation is signaled in the video usability information (or otherportion of the video parameter set). For example, the first flag mayinclude a timing information syntax element (e.g., timing informationsyntax element 902). The timing information syntax element may bechecked to determine if timing information is signaled. For example, avalue of 0 may indicate that timing information is not signaled. Inanother example, a value of 1 may indicate that timing information isnot signaled.

At 1006, the process 1000 includes determining whether to signalhypothetical reference decoder parameters in the video usabilityinformation of the video parameter set based on whether timinginformation is signaled in the video usability information (or otherportion of the video parameter set). In some examples, the process 1000includes signaling the hypothetical reference decoder parameters in thevideo usability information when timing information is signaled in thevideo usability information (or other portion of the video parameterset). The process 1000 further includes not signaling the hypotheticalreference decoder parameters in the video usability information whentiming information is not signaled in the video usability information(or other portion of the video parameter set). For example, an encoderor other network device may make a determination to not signal thehypothetical reference decoder parameters in the video usabilityinformation when the timing information is absent.

In some embodiments, the process 1000 includes determining a value of asecond flag in the video usability information based on the value of thefirst flag. The second flag defines whether hypothetical referencedecoder parameters are signaled in the video usability information. Forexample, the second flag may include a gating flag syntax element (e.g.,gating flag syntax element 904).

In some embodiments, the process 1000 includes providing, in the videousability information, one or more syntax elements for signalinginformation related to the encoded video bitstream, the informationincluding a condition that the value of the second flag is dependent onthe value of the first flag. For example, referring to FIG. 9, when thetiming information syntax element 902 is set to a value of 0 (indicatingthat no timing information is present), the gating flag syntax element904 may not be signaled in the VPS VUI (in which case the syntaxstructure 900 does not include the gating flag syntax element 904). Thevalue of the gating flag syntax element 904 is then inferred by theencoder to be a value of 0, indicating that no hypothetical referencedecoder parameters are to be signaled in the VPS VUI. The encoder maymake a determination not to signal hypothetical reference decoderparameters in the VPS VUI.

In some embodiments, the process 1000 includes providing, in the videousability information, one or more syntax elements for signalinginformation related to the encoded video bitstream, the informationincluding a constraint that the value of the second flag is to be set tozero when the value of the first flag is equal to zero. For example, asillustrated in FIG. 9, the condition 906 with the gating flag syntaxelement 904 may be added to the syntax structure 900. Based on thecondition 906, when the timing information syntax element 902 is set toa value of 0 (indicating that no timing information is present), theencoder may set the gating flag syntax element 904 to a value of 0,indicating that no hypothetical reference decoder parameters aresignaled in the VPS VUI. The encoder may determine not to signalhypothetical reference decoder parameters in the VPS VUI as a result.

In some aspects, the process 1000 is executable on a wirelesscommunication device. The wireless communication device may include amemory configured to store the video data. The memory may includestorage 108 shown in FIG. 1. The wireless communication device may alsoinclude a processor configured to execute instructions to process thevideo data stored in the memory. The processor may include the encoderengine 106 shown in FIG. 1, or another suitable processor for processingvideo data. The wireless communication device may further include atransmitter configured to transmit the encoded video bitstream includingthe video parameter set. The transmitter may be a wireless transmitter,or may be part of a wireless transceiver. In some aspects, the wirelesscommunication device is a cellular telephone and the encoded videobitstream is modulated according to a cellular communication standard.For example, the encoded video bitstream may be modulated using amodulator (e.g., a Quadrature Phase Shift modulator, a quadrature phaseshift key modulator, orthogonal frequency-division multiplexingmodulator, or any other suitable modulator, or a combination thereof).

The above-described techniques prevent signaling of hypotheticalreference decoder information when timing information is not present.Signaling of such information when no timing information is present isan inefficient use of resources, wasting valuable processing and networkresources. The encoder (or other device, such as an editor, splicer, orthe like) may intelligently determine when to signal hypotheticalreference decoder parameters based on the presence or absence of timinginformation.

In further embodiments, techniques and systems are described forselectively signaling different numbers of video signal informationsyntax structures in a parameter set. For example, embodiments aredescribed herein for determining a number of video signal informationsyntax structures to signal in the parameter set based on whether thebase layer is included in an encoded video bitstream or to be providedto a decoding device from an external source.

FIG. 11 illustrates an example environment 1100 in which an encodingdevice generates various layers of an encoded video bitstream, includinga base layer. The environment 1100 includes an HEVC encoding device 1102that generates an encoded video bitstream using the HEVC video codingstandard. One of ordinary skill in the art will appreciate that thetechniques described herein apply to other encoding devices that may usedifferent coding standards than HEVC standard, such as one or more ofthe AVC and MPEG standards. The HEVC encoding device 1102 may generatean HEVC compliant video bitstream that includes a base layer and one ormore enhancement layers. For example, the HEVC encoding device 1102 maygenerate base layer 0 and enhancement layer 1 to layer n. Layer n refersto the fact that the HEVC encoding device 1102 can generate any numberof enhancement layers, as determined by the particular implementation orapplication and as constrained by the HEVC standard.

The HEVC decoding device 1104 of the receiving device 1110 may receivethe base and enhancement layers from the HEVC encoding device 1102. Inthe example of FIG. 11, the base layer is provided to the HEVC decodingdevice 1104 in the HEVC bitstream. The HEVC encoding device 1102 mayalso send parameter sets, such as a VPS, to the HEVC decoding device1104 with information allowing the HEVC decoding device 1104 to properlydecode the encoded video bitstream. The information may include videosignal information, as described below.

FIG. 12 illustrates an example environment 1200 in which an encodingdevice generates various enhancement layers of an encoded videobitstream, but not a base layer. The environment 1200 includes an HEVCencoding device 1202 and an AVC encoding device 1206 that generateencoded video bitstreams using different video coding standards. One ofordinary skill in the art will appreciate that the techniques describedherein apply to other encoding devices that may use different codingstandards than HEVC or AVC. The HEVC encoding device 1202 may generatean HEVC compliant video bitstream that includes one or more enhancementlayers but no base layer. For example, the HEVC encoding device 1202 maygenerate enhancement layer 1 to layer n. The AVC encoding device 1206may generate an AVC compliant video bitstream that includes only a baselayer, including base layer 0. When the HEVC encoding device 1202generates the one or more enhancement layers, the base layer generatedby the AVC encoding device 1206 may be used for inter-layer predictionreference.

In one example, the HEVC decoding device 1204 may receive theenhancement layers from the HEVC encoding device 1202, and the AVCdecoding device 1208 may receive the base layer from the AVC encodingdevice 1206. In another example, a first network entity (e.g., an editoror splicer) may splice the enhancement layers from the HEVC encodingdevice 1202 together with the base layer from the AVC encoding device1206. The first network entity may perform the splicing in a timelysynchronous manner with system time information being added (e.g. in afile format according to the ISO base media file format). A secondnetwork entity (e.g., a receiver, such as receiving device 1210, a fileformat parser, or other network entity) may pass the bitstream of theone or more enhancement layers to the HEVC decoding device 1204 and thebitstream of the base layer to the AVC decoding device 1208. In eitherexample, the bitstream of the base layer is not provided to the HEVCdecoding device 1204. Instead, the decoded pictures of the base layerare provided to the HEVC decoding device 1204 (from the AVC decodingdevice 1208) for inter-layer prediction reference. From the point ofview of the HEVC decoding device 1204, the base layer is externallyprovided by an external source. In some embodiments, the HEVC decodingdevice 1204 and the AVC decoding device 1208 are separate decoders. Insome embodiments, the HEVC decoding device 1204 and the AVC decodingdevice 1208 are part of a multi-standard decoder that can decode HEVCand AVC bitstreams.

An HEVC encoding device may provide a video parameter set (VPS) with anHEVC compliant video bitstream (e.g., in one or more non-VCL NAL units).A video signal information syntax structure is signaled in the VPS foreach layer of a multi-layer encoded video bitstream, with a separatevideo signal information syntax structure being signaled for each layer.The video signal information syntax structures may be signaled in theVPS VUI of the VPS extension, and can be used to prepare the decodedvideo for output and display. Video signal information contained in avideo signal information syntax structure may include colorcharacteristics, such as color primaries, transfer characteristics, usedcolor conversion matrix coefficients, or other suitable colorinformation. Video signal information may also include video signal typeinformation indicating the original format of the source video (e.g.,NTSC, PAL, component, SECAM, MAC, unspecified, or other suitable videoformat) and, in some cases, a corresponding color format definition andformat specification. In some cases, the video signal information mayindicate locations of chroma samples in relation to locations of lumasamples, which can be used to present a correct color presentationduring display.

FIG. 13 illustrates an example of a VPS 1302 that can be sent by an HEVCencoding device along with the HEVC compliant video bitstream. The VPS1302 includes video signal information for multiple layers of an encodedvideo bitstream. The video signal information may be contained in one ormore video signal information syntax structures of a VPS VUI portion ofthe VPS 1302. For example, the VPS 1302 includes a video signalinformation syntax structure 1304 for a layer with layer ID=0(corresponding to a base layer), a video signal information syntaxstructure 1306 for an enhancement layer with layer ID=1, and a videosignal information syntax structure 1308 for an enhancement layer withlayer ID=n.

In some cases, a number of video signal information syntax structures toinclude (or that is included) in the VPS 1302 is not explicitlysignaled. For example, a syntax element (e.g.,vps_num_video_signal_info_minus1) that indicates the number of videosignal information syntax structures to include in the VPS 1302 may notbe present. In such cases, the number of video signal information syntaxstructures to include in the VPS 1302 is inferred to be equal to thetotal number of layers in the bitstream (regardless of whether the baselayer is provided externally or included in the HEVC encoded videobitstream), leading to one video signal information syntax structurebeing signaled for each layer ID value, and each layer being assigned toa signaled video signal information syntax structure according to itslayer ID value. When the base layer is provided externally (e.g., by anAVC encoding device, as shown in FIG. 12), a signal informationstructure syntax structure is sent that is useless with respect to theHEVC decoder because the HEVC decoder does not need the signalinformation syntax structure for the base layer.

Techniques are described for updating the signaling of the video signalinformation syntax structures in the VPS (e.g., in the VPS VUI) to moreefficiently provide data in the VPS. For example, a number of videosignal information syntax structures to signal in the VPS is determinedbased on whether the base layer is included in the encoded videobitstream or to be provided to an HEVC decoding device from an externalsource. The signaling of the video signal information in the VPS may beupdated when the number of video signal information syntax structures inthe VPS VUI is not signaled explicitly (e.g., when a syntax element,such as vps_num_video_signal_info_minus1, is not present in the VPS orVPS VUI). For example, the number of video signal information syntaxstructures signaled in the VPS is inferred to be equal to the maximumnumber of layers of the bitstream if the base layer is in the HEVCbitstream (not provided externally as shown in FIG. 11). In embodimentsin which the base layer is provided externally (as shown in FIG. 12),the number of video signal information syntax structures signaled in theVPS is inferred to be equal to the maximum number of layers of thebitstream minus one. Accordingly, when the base layer is provided froman external source, the number of video signal information syntaxstructures in the VPS is reduced by one.

In some embodiments, the layer IDs of the layers are mapped to videosignal information syntax structures in an index to indicate whichsyntax structures will apply to the different layers. In suchembodiments, when the number of video signal information syntaxstructures in the VPS is not signaled explicitly, the mapping betweenlayer ID to the index of video signal information syntax structures isupdated so that no video signal information syntax structure is assignedto the base layer. Accordingly, a video signal information syntaxstructure is assigned to each of the layers included in the HEVC encodedvideo bitstream, and no video signal information syntax structure isassigned to the base layer that is to be provided to the decoder fromthe external source.

Changes to the HEVC standard to implement the above-described techniquesfor updating the signaling of the video signal information syntaxstructures in the VPS may include:

-   -   video_signal_info_idx_present_flag equal to 1 specifies that the        syntax elements vps_num_video_signal_info_minus1, and        vps_video_signal_info_idx[i] are present.        video_signal_info_idx_present_flag equal to 0 specifies that the        syntax elements vps_num_video_signal_info_minus1, and        vps_video_signal_info_idx[i] are not present.    -   vps_num_video_signal_info_minus1 plus 1 specifies the number of        the following video_signal_info( ) syntax structures in the VPS.        When not present, the value of vps_num_video_signal_info_minus1        is inferred to be equal to        MaxLayersMinus1−(vps_base_layer_internal_flag ?0:1).    -   vps_video_signal_info_idx[i ] specifies the index, into the list        of video_signal_info( ) syntax structures in the VPS, of the        video_signal_info( ) syntax structure that applies to the layer        with nuh_layer_id equal to layer_id_in_nuh[i]. When        vps_video_signal_info_idx[i] is not present,        vps_video_signal_info_idx[i] is inferred to be equal to        (video_signal_info_idx_present flag ?0:i). The value of        vps_video_signal_info_idx[i] shall be in the range of 0 to        vps_num_video_signal_info_minus1, inclusive.    -   When not present, the value of vps video_signal_info_idx[i ] is        inferred as follows:        -   If video_signal_info_idx_present_flag is equal to 1,            vps_video_signal_info_idx[i] is inferred to be equal to 0.        -   Otherwise, vps_video_signal_info_idx[i] is inferred to be            equal to i−(vps_base_layer_internal_flag ?0:1    -   vps_vui_bsp_hrd_present_flag equal to 0 specifies that no        bitstream partition HRD parameters are present in the VPS VUI.        vps_vui_bsp_hrd_present_flag equal to 1 specifies that bitstream        partition HRD parameters are present in the VPS VUI. When not        present, vps_vui_bsp_hrd_present_flag is inferred to be equal to        0.

FIG. 14 illustrates an embodiment of a process 1400 of encoding videodata. The process 1400 is implemented to update the signaling of thevideo_signal_information syntax structures in the VPS by selectivelysignaling different numbers of video_signal_information syntaxstructures in the VPS. In some aspects, the process 1400 may beperformed by a computing device or an apparatus, such as the encodingdevice 104 shown in FIG. 1 or in FIG. 16. For example, the computingdevice or apparatus may include an encoder, or a processor,microprocessor, microcomputer, or other component of an encoder that isconfigured to carry out the steps of process 1400.

Process 1400 is illustrated as a logical flow diagram, the operation ofwhich represents a sequence of operations that can be implemented inhardware, computer instructions, or a combination thereof. In thecontext of computer instructions, the operations representcomputer-executable instructions stored on one or more computer-readablestorage media that, when executed by one or more processors, perform therecited operations. Generally, computer-executable instructions includeroutines, programs, objects, components, data structures, and the likethat perform particular functions or implement particular data types.The order in which the operations are described is not intended to beconstrued as a limitation, and any number of the described operationscan be combined in any order and/or in parallel to implement theprocesses.

Additionally, the process 1400 may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs, or one or more applications) executing collectivelyon one or more processors, by hardware, or combinations thereof. Asnoted above, the code may be stored on a computer-readable ormachine-readable storage medium, for example, in the form of a computerprogram comprising a plurality of instructions executable by one or moreprocessors. The computer-readable or machine-readable storage medium maybe non-transitory.

At 1402, the process 1400 of encoding video data includes generating anencoded video bitstream according to a first coding protocol. Theencoded video bitstream includes one or more enhancement layers and avideo parameter set defining parameters of the encoded video bitstream.In some embodiments, the encoded video bitstream may be encoded using anHEVC coding technique, or other suitable coding technique.

At 1404, the process 1400 includes determining that a syntax elementindicative of a number of video_signal_information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set. For example, an encoder may determine that a syntaxelement (e.g., vps_num_video signal_info_minus1) that indicates thenumber of video signal information syntax structures to include in thevideo parameter set is not present in the video parameter set (e.g., theVPS or VPS VUI).

At 1406, the process 1400 includes determining the number ofvideo_signal_information syntax structures to include in the videoparameter set when the syntax element indicative of the number ofvideo_signal_information syntax structures provided in the encoded videobitstream is not present in the video parameter set. The number isdetermined as a first value or a second value based on whether a baselayer is included in the encoded video bitstream or to be provided to adecoder from an external source. In some embodiments, the number ofvideo signal information syntax structures to include in the videoparameter set is determined as the first value when it is determinedthat the base layer is included in the encoded video bitstream, in whichcase the first value is equal to a maximum number of layers of theencoded video bitstream.

In some embodiments, the number of video_signal_information syntaxstructures to include in the video parameter set is determined as thesecond value when it is determined that the base layer is to be providedto the decoder from the external source, in which case the second valueis equal to a maximum number of layers of the encoded video bitstreamminus one. In some embodiments, a video_signal_information syntaxstructure is assigned to each of the layers included in the encodedvideo bitstream, and no video_signal_information syntax structure isassigned to the base layer that is to be provided to the decoder fromthe external source. In some embodiments, the base layer provided fromthe external source is encoded according to a second coding protocol,the second coding protocol being different than the first codingprotocol. In some examples, the first coding protocol includes a highefficiency video coding protocol, and the second coding protocolincludes an advanced video coding protocol.

FIG. 15 illustrates an embodiment of a process 1500 of decoding videodata. The process 1500 is implemented to infer a number ofvideo_signal_information syntax structures in the VPS. In some aspects,the process 1500 may be performed by a computing device or an apparatus,such as the decoding device 112 shown in FIG. 1 or in FIG. 17. Forexample, the computing device or apparatus may include a decoder, or aprocessor, microprocessor, microcomputer, or other component of andecoder that is configured to carry out the steps of process 1500.

Process 1500 is illustrated as a logical flow diagram, the operation ofwhich represents a sequence of operations that can be implemented inhardware, computer instructions, or a combination thereof. In thecontext of computer instructions, the operations representcomputer-executable instructions stored on one or more computer-readablestorage media that, when executed by one or more processors, perform therecited operations. Generally, computer-executable instructions includeroutines, programs, objects, components, data structures, and the likethat perform particular functions or implement particular data types.The order in which the operations are described is not intended to beconstrued as a limitation, and any number of the described operationscan be combined in any order and/or in parallel to implement theprocesses.

Additionally, the process 1500 may be performed under the control of oneor more computer systems configured with executable instructions and maybe implemented as code (e.g., executable instructions, one or morecomputer programs, or one or more applications) executing collectivelyon one or more processors, by hardware, or combinations thereof. Asnoted above, the code may be stored on a computer-readable ormachine-readable storage medium, for example, in the form of a computerprogram comprising a plurality of instructions executable by one or moreprocessors. The computer-readable or machine-readable storage medium maybe non-transitory.

At 1502, the process 1500 of decoding video data includes accessing anencoded video bitstream encoded according to a first coding protocol.The encoded video bitstream includes one or more enhancement layers anda video parameter set defining parameters of the encoded videobitstream. In some embodiments, the encoded video bitstream may beencoded using an HEVC coding technique, or other suitable codingtechnique.

At 1504, the process 1500 includes determining that a syntax elementindicative of a number of video_signal_information syntax structuresprovided in the encoded video bitstream is not present in the videoparameter set. For example, a decoder may determine that a syntaxelement (e.g., vps_num_video_signal_info_minus1) that indicates thenumber of video signal information syntax structures to include in thevideo parameter set is not present in the video parameter set. At 1506,the process 1500 includes determining whether a base layer is includedin the encoded video bitstream or to be received from an externalsource. For example, the determination of whether a base layer isincluded in the encoded video bitstream or to be received from anexternal source may be based on an indication provided to a decoder. Theindication may be conveyed through a syntax element of the VPS. In oneexample, a syntax structure of the VPS may include a flag with a value(e.g., 1 or 0) indicating to the decoder that the base layer is includedin the encoded video bitstream. In another example, a syntax structureof the VPS may include a flag with a value (e.g., 1 or 0) indicating tothe decoder that the base layer is to be received from an externalsource.

At 1508, the process 1500 includes determining the number ofvideo_signal_information syntax structures included in the videoparameter set to be a first value or a second value based on whether thebase layer is included in the encoded video bitstream or to be receivedfrom the external source. In some embodiments, the process 1500 includesdetermining the number of video_signal_information syntax structures tobe the first value when it is determined that the base layer is includedin the encoded video bitstream, in which case the first value is equalto a maximum number of layers of the encoded video bitstream.

In some embodiments, the process 1500 includes determining the number ofvideo signal information syntax structures to be the second value whenit is determined that the base layer is to be received from the externalsource, in which case the second value is equal to a maximum number oflayers of the encoded video bitstream minus one. In some embodiments, avideo_signal_information syntax structure is assigned to each of thelayers included in the encoded video bitstream, and novideo_signal_information syntax structure is assigned to the base layerthat is to be received from the external source. In some embodiments,the base layer provided from the external source is encoded according toa second coding protocol, the second coding protocol being differentthan the first coding protocol. In some examples, the first codingprotocol includes a high efficiency video coding protocol, and whereinthe second coding protocol includes an advanced video coding protocol.

The above-described techniques prevent signaling of superfluous videosignal information syntax structures when the base layer is provided byan external source. Signaling of such information even when the baselayer is encoded according to a separate protocol leads toinefficiencies because the extra video_signal_information syntaxstructures are not needed.

The coding techniques discussed herein may be implemented in an examplevideo encoding and decoding system (e.g., system 100). A system includesa source device that provides encoded video data to be decoded at alater time by a destination device. In particular, the source deviceprovides the video data to destination device via a computer-readablemedium. The source device and the destination device may comprise any ofa wide range of devices, including desktop computers, notebook (i.e.,laptop) computers, tablet computers, set-top boxes, telephone handsetssuch as so-called “smart” phones, so-called “smart” pads, televisions,cameras, display devices, digital media players, video gaming consoles,video streaming device, or the like. In some cases, the source deviceand the destination device may be equipped for wireless communication.

The destination device may receive the encoded video data to be decodedvia the computer-readable medium. The computer-readable medium maycomprise any type of medium or device capable of moving the encodedvideo data from source device to destination device. In one example,computer-readable medium may comprise a communication medium to enablesource device to transmit encoded video data directly to destinationdevice in real-time. The encoded video data may be modulated accordingto a communication standard, such as a wireless communication protocol,and transmitted to destination device. The communication medium maycomprise any wireless or wired communication medium, such as a radiofrequency (RF) spectrum or one or more physical transmission lines. Thecommunication medium may form part of a packet-based network, such as alocal area network, a wide-area network, or a global network such as theInternet. The communication medium may include routers, switches, basestations, or any other equipment that may be useful to facilitatecommunication from source device to destination device.

In some examples, encoded data may be output from output interface to astorage device. Similarly, encoded data may be accessed from the storagedevice by input interface. The storage device may include any of avariety of distributed or locally accessed data storage media such as ahard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storingencoded video data. In a further example, the storage device maycorrespond to a file server or another intermediate storage device thatmay store the encoded video generated by source device. Destinationdevice may access stored video data from the storage device viastreaming or download. The file server may be any type of server capableof storing encoded video data and transmitting that encoded video datato the destination device. Example file servers include a web server(e.g., for a website), an FTP server, network attached storage (NAS)devices, or a local disk drive. Destination device may access theencoded video data through any standard data connection, including anInternet connection. This may include a wireless channel (e.g., a Wi-Ficonnection), a wired connection (e.g., DSL, cable modem, etc.), or acombination of both that is suitable for accessing encoded video datastored on a file server. The transmission of encoded video data from thestorage device may be a streaming transmission, a download transmission,or a combination thereof.

The techniques of this disclosure are not necessarily limited towireless applications or settings. The techniques may be applied tovideo coding in support of any of a variety of multimedia applications,such as over-the-air television broadcasts, cable televisiontransmissions, satellite television transmissions, Internet streamingvideo transmissions, such as dynamic adaptive streaming over HTTP(DASH), digital video that is encoded onto a data storage medium,decoding of digital video stored on a data storage medium, or otherapplications. In some examples, system may be configured to supportone-way or two-way video transmission to support applications such asvideo streaming, video playback, video broadcasting, and/or videotelephony.

In one example the source device includes a video source, a videoencoder, and a output interface. The destination device may include aninput interface, a video decoder, and a display device. The videoencoder of source device may be configured to apply the techniquesdisclosed herein. In other examples, a source device and a destinationdevice may include other components or arrangements. For example, thesource device may receive video data from an external video source, suchas an external camera. Likewise, the destination device may interfacewith an external display device, rather than including an integrateddisplay device.

The example system above merely one example. Techniques for processingvideo data in parallel may be performed by any digital video encodingand/or decoding device. Although generally the techniques of thisdisclosure are performed by a video encoding device, the techniques mayalso be performed by a video encoder/decoder, typically referred to as a“CODEC.” Moreover, the techniques of this disclosure may also beperformed by a video preprocessor. Source device and destination deviceare merely examples of such coding devices in which source devicegenerates coded video data for transmission to destination device. Insome examples, the source and destination devices may operate in asubstantially symmetrical manner such that each of the devices includevideo encoding and decoding components. Hence, example systems maysupport one-way or two-way video transmission between video devices,e.g., for video streaming, video playback, video broadcasting, or videotelephony.

The video source may include a video capture device, such as a videocamera, a video archive containing previously captured video, and/or avideo feed interface to receive video from a video content provider. Asa further alternative, the video source may generate computergraphics-based data as the source video, or a combination of live video,archived video, and computer-generated video. In some cases, if videosource is a video camera, source device and destination device may formso-called camera phones or video phones. As mentioned above, however,the techniques described in this disclosure may be applicable to videocoding in general, and may be applied to wireless and/or wiredapplications. In each case, the captured, pre-captured, orcomputer-generated video may be encoded by the video encoder. Theencoded video information may then be output by output interface ontothe computer-readable medium.

As noted the computer-readable medium may include transient media, suchas a wireless broadcast or wired network transmission, or storage media(that is, non-transitory storage media), such as a hard disk, flashdrive, compact disc, digital video disc, Blu-ray disc, or othercomputer-readable media. In some examples, a network server (not shown)may receive encoded video data from the source device and provide theencoded video data to the destination device, e.g., via networktransmission. Similarly, a computing device of a medium productionfacility, such as a disc stamping facility, may receive encoded videodata from the source device and produce a disc containing the encodedvideo data. Therefore, the computer-readable medium may be understood toinclude one or more computer-readable media of various forms, in variousexamples.

The input interface of the destination device receives information fromthe computer-readable medium. The information of the computer-readablemedium may include syntax information defined by the video encoder,which is also used by the video decoder, that includes syntax elementsthat describe characteristics and/or processing of blocks and othercoded units, e.g., group of pictures (GOP). A display device displaysthe decoded video data to a user, and may comprise any of a variety ofdisplay devices such as a cathode ray tube (CRT), a liquid crystaldisplay (LCD), a plasma display, an organic light emitting diode (OLED)display, or another type of display device. Various embodiments of theinvention have been described.

Specific details of the encoding device 104 and the decoding device 112are shown in FIG. 16 and FIG. 17, respectively. FIG. 16 is a blockdiagram illustrating an example encoding device 104 that may implementone or more of the techniques described in this disclosure. Encodingdevice 104 may, for example, generate the syntax structures describedherein (e.g., the syntax structures of a VPS, SPS, PPS, or other syntaxelements). Encoding device 104 may perform intra-prediction andinter-prediction coding of video blocks within video slices. Aspreviously described, intra-coding relies, at least in part, on spatialprediction to reduce or remove spatial redundancy within a given videoframe or picture. Inter-coding relies, at least in part, on temporalprediction to reduce or remove temporal redundancy within adjacent orsurrounding frames of a video sequence. Intra-mode (I mode) may refer toany of several spatial based compression modes. Inter-modes, such asuni-directional prediction (P mode) or bi-prediction (B mode), may referto any of several temporal-based compression modes.

The encoding device 104 includes a partitioning unit 35, predictionprocessing unit 41, filter unit 63, picture memory 64, summer 50,transform processing unit 52, quantization unit 54, and entropy encodingunit 56. Prediction processing unit 41 includes motion estimation unit42, motion compensation unit 44, and intra-prediction processing unit46. For video block reconstruction, encoding device 104 also includesinverse quantization unit 58, inverse transform processing unit 60, andsummer 62. Filter unit 63 is intended to represent one or more loopfilters such as a deblocking filter, an adaptive loop filter (ALF), anda sample adaptive offset (SAO) filter. Although filter unit 63 is shownin FIG. 16 as being an in loop filter, in other configurations, filterunit 63 may be implemented as a post loop filter. A post processingdevice 57 may perform additional processing on encoded video datagenerated by encoding device 104. The techniques of this disclosure mayin some instances be implemented by encoding device 104. In otherinstances, however, one or more of the techniques of this disclosure maybe implemented by post processing device 57.

As shown in FIG. 16, encoding device 104 receives video data, andpartitioning unit 35 partitions the data into video blocks. Thepartitioning may also include partitioning into slices, slice segments,tiles, or other larger units, as wells as video block partitioning,e.g., according to a quadtree structure of LCUs and CUs. Encoding device104 generally illustrates the components that encode video blocks withina video slice to be encoded. The slice may be divided into multiplevideo blocks (and possibly into sets of video blocks referred to astiles). Prediction processing unit 41 may select one of a plurality ofpossible coding modes, such as one of a plurality of intra-predictioncoding modes or one of a plurality of inter-prediction coding modes, forthe current video block based on error results (e.g., coding rate andthe level of distortion, or the like). Prediction processing unit 41 mayprovide the resulting intra- or inter-coded block to summer 50 togenerate residual block data and to summer 62 to reconstruct the encodedblock for use as a reference picture.

Intra-prediction processing unit 46 within prediction processing unit 41may perform intra-prediction coding of the current video block relativeto one or more neighboring blocks in the same frame or slice as thecurrent block to be coded to provide spatial compression. Motionestimation unit 42 and motion compensation unit 44 within predictionprocessing unit 41 perform inter-predictive coding of the current videoblock relative to one or more predictive blocks in one or more referencepictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine theinter-prediction mode for a video slice according to a predeterminedpattern for a video sequence. The predetermined pattern may designatevideo slices in the sequence as P slices, B slices, or GPB slices.Motion estimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation, performed by motion estimation unit 42, is theprocess of generating motion vectors, which estimate motion for videoblocks. A motion vector, for example, may indicate the displacement of aprediction unit (PU) of a video block within a current video frame orpicture relative to a predictive block within a reference picture.

A predictive block is a block that is found to closely match the PU ofthe video block to be coded in terms of pixel difference, which may bedetermined by sum of absolute difference (SAD), sum of square difference(SSD), or other difference metrics. In some examples, encoding device104 may calculate values for sub-integer pixel positions of referencepictures stored in picture memory 64. For example, encoding device 104may interpolate values of one-quarter pixel positions, one-eighth pixelpositions, or other fractional pixel positions of the reference picture.Therefore, motion estimation unit 42 may perform a motion searchrelative to the full pixel positions and fractional pixel positions andoutput a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in picture memory 64. Motion estimationunit 42 sends the calculated motion vector to entropy encoding unit 56and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation, possibly performinginterpolations to sub-pixel precision. Upon receiving the motion vectorfor the PU of the current video block, motion compensation unit 44 maylocate the predictive block to which the motion vector points in areference picture list. Encoding device 104 forms a residual video blockby subtracting pixel values of the predictive block from the pixelvalues of the current video block being coded, forming pixel differencevalues. The pixel difference values form residual data for the block,and may include both luma and chroma difference components. Summer 50represents the component or components that perform this subtractionoperation. Motion compensation unit 44 may also generate syntax elementsassociated with the video blocks and the video slice for use by decodingdevice 112 in decoding the video blocks of the video slice.

Intra-prediction processing unit 46 may intra-predict a current block,as an alternative to the inter-prediction performed by motion estimationunit 42 and motion compensation unit 44, as described above. Inparticular, intra-prediction processing unit 46 may determine anintra-prediction mode to use to encode a current block. In someexamples, intra-prediction processing unit 46 may encode a current blockusing various intra-prediction modes, e.g., during separate encodingpasses, and intra-prediction unit processing 46 may select anappropriate intra-prediction mode to use from the tested modes. Forexample, intra-prediction processing unit 46 may calculaterate-distortion values using a rate-distortion analysis for the varioustested intra-prediction modes, and may select the intra-prediction modehaving the best rate-distortion characteristics among the tested modes.Rate-distortion analysis generally determines an amount of distortion(or error) between an encoded block and an original, unencoded blockthat was encoded to produce the encoded block, as well as a bit rate(that is, a number of bits) used to produce the encoded block.Intra-prediction processing unit 46 may calculate ratios from thedistortions and rates for the various encoded blocks to determine whichintra-prediction mode exhibits the best rate-distortion value for theblock.

In any case, after selecting an intra-prediction mode for a block,intra-prediction processing unit 46 may provide information indicativeof the selected intra-prediction mode for the block to entropy encodingunit 56. Entropy encoding unit 56 may encode the information indicatingthe selected intra-prediction mode. Encoding device 104 may include inthe transmitted bitstream configuration data definitions of encodingcontexts for various blocks as well as indications of a most probableintra-prediction mode, an intra-prediction mode index table, and amodified intra-prediction mode index table to use for each of thecontexts. The bitstream configuration data may include a plurality ofintra-prediction mode index tables and a plurality of modifiedintra-prediction mode index tables (also referred to as codeword mappingtables).

After prediction processing unit 41 generates the predictive block forthe current video block via either inter-prediction or intra-prediction,encoding device 104 forms a residual video block by subtracting thepredictive block from the current video block. The residual video datain the residual block may be included in one or more TUs and applied totransform processing unit 52. Transform processing unit 52 transformsthe residual video data into residual transform coefficients using atransform, such as a discrete cosine transform (DCT) or a conceptuallysimilar transform. Transform processing unit 52 may convert the residualvideo data from a pixel domain to a transform domain, such as afrequency domain.

Transform processing unit 52 may send the resulting transformcoefficients to quantization unit 54. Quantization unit 54 quantizes thetransform coefficients to further reduce bit rate. The quantizationprocess may reduce the bit depth associated with some or all of thecoefficients. The degree of quantization may be modified by adjusting aquantization parameter.

In some examples, quantization unit 54 may then perform a scan of thematrix including the quantized transform coefficients. Alternatively,entropy encoding unit 56 may perform the scan.

Following quantization, entropy encoding unit 56 entropy encodes thequantized transform coefficients. For example, entropy encoding unit 56may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy encoding technique. Followingthe entropy encoding by entropy encoding unit 56, the encoded bitstreammay be transmitted to decoding device 112, or archived for latertransmission or retrieval by decoding device 112. Entropy encoding unit56 may also entropy encode the motion vectors and the other syntaxelements for the current video slice being coded.

Inverse quantization unit 58 and inverse transform processing unit 60apply inverse quantization and inverse transformation, respectively, toreconstruct the residual block in the pixel domain for later use as areference block of a reference picture. Motion compensation unit 44 maycalculate a reference block by adding the residual block to a predictiveblock of one of the reference pictures within a reference picture list.Motion compensation unit 44 may also apply one or more interpolationfilters to the reconstructed residual block to calculate sub-integerpixel values for use in motion estimation. Summer 62 adds thereconstructed residual block to the motion compensated prediction blockproduced by motion compensation unit 44 to produce a reference block forstorage in picture memory 64. The reference block may be used by motionestimation unit 42 and motion compensation unit 44 as a reference blockto inter-predict a block in a subsequent video frame or picture.

In this manner, encoding device 104 of FIG. 16 represents an example ofa video encoder configured to generate syntax for a encoded videobitstream. Encoding device 104 may, for example, generate VPS, SPS, andPPS parameter sets as described above. The encoding device 104 mayperform any of the techniques described herein, including the processesdescribed above with respect to FIGS. 7, 8, 10, 14, and 15. Thetechniques of this disclosure have generally been described with respectto encoding device 104, but as mentioned above, some of the techniquesof this disclosure may also be implemented by post processing device 57.

FIG. 17 is a block diagram illustrating an example decoding device 112.The decoding device 112 includes an entropy decoding unit 80, predictionprocessing unit 81, inverse quantization unit 86, inverse transformprocessing unit 88, summer 90, filter unit 91, and picture memory 92.Prediction processing unit 81 includes motion compensation unit 82 andintra prediction processing unit 84. Decoding device 112 may, in someexamples, perform a decoding pass generally reciprocal to the encodingpass described with respect to encoding device 104 from FIG. 16.

During the decoding process, decoding device 112 receives an encodedvideo bitstream that represents video blocks of an encoded video sliceand associated syntax elements sent by encoding device 104. In someembodiments, the decoding device 112 may receive the encoded videobitstream from the encoding device 104. In some embodiments, thedecoding device 112 may receive the encoded video bitstream from anetwork entity 79, such as a server, a media-aware network element(MANE), a video editor/splicer, or other such device configured toimplement one or more of the techniques described above. Network entity79 may or may not include encoding device 104. Some of the techniquesdescribed in this disclosure may be implemented by network entity 79prior to network entity 79 transmitting the encoded video bitstream todecoding device 112. In some video decoding systems, network entity 79and decoding device 112 may be parts of separate devices, while in otherinstances, the functionality described with respect to network entity 79may be performed by the same device that comprises decoding device 112.

The entropy decoding unit 80 of decoding device 112 entropy decodes thebitstream to generate quantized coefficients, motion vectors, and othersyntax elements. Entropy decoding unit 80 forwards the motion vectorsand other syntax elements to prediction processing unit 81. Decodingdevice 112 may receive the syntax elements at the video slice leveland/or the video block level. Entropy decoding unit 80 may process andparse both fixed-length syntax elements and variable-length syntaxelements in or more parameter sets, such as a VPS, SPS, and PPS.

When the video slice is coded as an intra-coded (I) slice, intraprediction processing unit 84 of prediction processing unit 81 maygenerate prediction data for a video block of the current video slicebased on a signaled intra-prediction mode and data from previouslydecoded blocks of the current frame or picture. When the video frame iscoded as an inter-coded (i.e., B, P or GPB) slice, motion compensationunit 82 of prediction processing unit 81 produces predictive blocks fora video block of the current video slice based on the motion vectors andother syntax elements received from entropy decoding unit 80. Thepredictive blocks may be produced from one of the reference pictureswithin a reference picture list. Decoding device 112 may construct thereference frame lists, List 0 and List 1, using default constructiontechniques based on reference pictures stored in picture memory 92.

Motion compensation unit 82 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 82 may use one or more syntax elementsin a parameter set to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice, P slice, or GPB slice),construction information for one or more reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 82 may also perform interpolation based oninterpolation filters. Motion compensation unit 82 may use interpolationfilters as used by encoding device 104 during encoding of the videoblocks to calculate interpolated values for sub-integer pixels ofreference blocks. In this case, motion compensation unit 82 maydetermine the interpolation filters used by encoding device 104 from thereceived syntax elements, and may use the interpolation filters toproduce predictive blocks.

Inverse quantization unit 86 inverse quantizes, or de-quantizes, thequantized transform coefficients provided in the bitstream and decodedby entropy decoding unit 80. The inverse quantization process mayinclude use of a quantization parameter calculated by encoding device104 for each video block in the video slice to determine a degree ofquantization and, likewise, a degree of inverse quantization that shouldbe applied. Inverse transform processing unit 88 applies an inversetransform (e.g., an inverse DCT or other suitable inverse transform), aninverse integer transform, or a conceptually similar inverse transformprocess, to the transform coefficients in order to produce residualblocks in the pixel domain.

After motion compensation unit 82 generates the predictive block for thecurrent video block based on the motion vectors and other syntaxelements, decoding device 112 forms a decoded video block by summing theresidual blocks from inverse transform processing unit 88 with thecorresponding predictive blocks generated by motion compensation unit82. Summer 90 represents the component or components that perform thissummation operation. If desired, loop filters (either in the coding loopor after the coding loop) may also be used to smooth pixel transitions,or to otherwise improve the video quality. Filter unit 91 is intended torepresent one or more loop filters such as a deblocking filter, anadaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.Although filter unit 91 is shown in FIG. 17 as being an in loop filter,in other configurations, filter unit 91 may be implemented as a postloop filter. The decoded video blocks in a given frame or picture arethen stored in picture memory 92, which stores reference pictures usedfor subsequent motion compensation. Picture memory 92 also storesdecoded video for later presentation on a display device, such as videodestination device 122 shown in FIG. 1.

In the foregoing description, aspects of the application are describedwith reference to specific embodiments thereof, but those skilled in theart will recognize that the invention is not limited thereto. Thus,while illustrative embodiments of the application have been described indetail herein, it is to be understood that the inventive concepts may beotherwise variously embodied and employed, and that the appended claimsare intended to be construed to include such variations, except aslimited by the prior art. Various features and aspects of theabove-described invention may be used individually or jointly. Further,embodiments can be utilized in any number of environments andapplications beyond those described herein without departing from thebroader spirit and scope of the specification. The specification anddrawings are, accordingly, to be regarded as illustrative rather thanrestrictive. For the purposes of illustration, methods were described ina particular order. It should be appreciated that in alternateembodiments, the methods may be performed in a different order than thatdescribed.

Where components are described as being “configured to” perform certainoperations, such configuration can be accomplished, for example, bydesigning electronic circuits or other hardware to perform theoperation, by programming programmable electronic circuits (e.g.,microprocessors, or other suitable electronic circuits) to perform theoperation, or any combination thereof.

The various illustrative logical blocks, modules, circuits, andalgorithm steps described in connection with the embodiments disclosedherein may be implemented as electronic hardware, computer software,firmware, or combinations thereof. To clearly illustrate thisinterchangeability of hardware and software, various illustrativecomponents, blocks, modules, circuits, and steps have been describedabove generally in terms of their functionality. Whether suchfunctionality is implemented as hardware or software depends upon theparticular application and design constraints imposed on the overallsystem. Skilled artisans may implement the described functionality invarying ways for each particular application, but such implementationdecisions should not be interpreted as causing a departure from thescope of the present invention.

The techniques described herein may also be implemented in electronichardware, computer software, firmware, or any combination thereof. Suchtechniques may be implemented in any of a variety of devices such asgeneral purposes computers, wireless communication device handsets, orintegrated circuit devices having multiple uses including application inwireless communication device handsets and other devices. Any featuresdescribed as modules or components may be implemented together in anintegrated logic device or separately as discrete but interoperablelogic devices. If implemented in software, the techniques may berealized at least in part by a computer-readable data storage mediumcomprising program code including instructions that, when executed,performs one or more of the methods described above. Thecomputer-readable data storage medium may form part of a computerprogram product, which may include packaging materials. Thecomputer-readable medium may comprise memory or data storage media, suchas random access memory (RAM) such as synchronous dynamic random accessmemory (SDRAM), read-only memory (ROM), non-volatile random accessmemory (NVRAM), electrically erasable programmable read-only memory(EEPROM), FLASH memory, magnetic or optical data storage media, and thelike. The techniques additionally, or alternatively, may be realized atleast in part by a computer-readable communication medium that carriesor communicates program code in the form of instructions or datastructures and that can be accessed, read, and/or executed by acomputer, such as propagated signals or waves.

The program code may be executed by a processor, which may include oneor more processors, such as one or more digital signal processors(DSPs), general purpose microprocessors, an application specificintegrated circuits (ASICs), field programmable logic arrays (FPGAs), orother equivalent integrated or discrete logic circuitry. Such aprocessor may be configured to perform any of the techniques describedin this disclosure. A general purpose processor may be a microprocessor;but in the alternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration. Accordingly, the term “processor,” as used herein mayrefer to any of the foregoing structure, any combination of theforegoing structure, or any other structure or apparatus suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated software modules or hardware modules configured for encodingand decoding, or incorporated in a combined video encoder-decoder(CODEC).

What is claimed is:
 1. A method of encoding video data, the methodcomprising: generating an encoded video bitstream comprising multiplelayers, the encoded video bitstream including a video parameter setdefining parameters of the encoded video bitstream, wherein the videoparameter set includes video usability information; determining whethertiming information is signaled in the video usability information of thevideo parameter set; and determining whether to signal hypotheticalreference decoder parameters in the video usability information of thevideo parameter set based on whether timing information is signaled inthe video usability information.
 2. The method of claim 1, furthercomprising signaling the hypothetical reference decoder parameters inthe video usability information when timing information is signaled inthe video usability information.
 3. The method of claim 1, furthercomprising not signaling the hypothetical reference decoder parametersin the video usability information when timing information is notsignaled in the video usability information.
 4. The method of claim 1,wherein determining whether the timing information is signaled in thevideo usability information of the video parameter set includesdetermining a value of a first flag in the video usability information,the first flag indicating whether the timing information is signaled inthe video usability information.
 5. The method of claim 4, furthercomprising determining a value of a second flag in the video usabilityinformation based on the value of the first flag, the second flagdefining whether hypothetical reference decoder parameters are signaledin the video usability information.
 6. The method of claim 5, furthercomprising providing, in the video usability information, one or moresyntax elements for signaling information related to the encoded videobitstream, the information including a condition that the value of thesecond flag is dependent on the value of the first flag.
 7. The methodof claim 5, further comprising providing, in the video usabilityinformation, one or more syntax elements for signaling informationrelated to the encoded video bitstream, the information including aconstraint that the value of the second flag is to be set to zero whenthe value of the first flag is equal to zero.
 8. The method of claim 1,the method being executable on a wireless communication device, whereinthe device comprises: a memory configured to store the video data; aprocessor configured to execute instructions to process the video datastored in the memory; and a transmitter configured to transmit theencoded video bitstream including the video parameter set.
 9. The methodof claim 8, wherein the wireless communication device is a cellulartelephone and the encoded video bitstream is modulated according to acellular communication standard.
 10. An apparatus comprising: a memoryconfigured to store video data; and a processor configured to: generate,from the video data, an encoded video bitstream comprising multiplelayers, the encoded video bitstream including a video parameter setdefining parameters of the encoded video bitstream, wherein the videoparameter set includes video usability information; determine whethertiming information is signaled in the video usability information of thevideo parameter set; and determine whether to signal hypotheticalreference decoder parameters in the video usability information of thevideo parameter set based on whether timing information is signaled inthe video usability information.
 11. The apparatus of claim 10, whereinthe processor is configured to signal the hypothetical reference decoderparameters in the video usability information when timing information issignaled in the video usability information.
 12. The apparatus of claim10, wherein the processor is configured to determine not to signal thehypothetical reference decoder parameters in the video usabilityinformation when timing information is not signaled in the videousability information.
 13. The apparatus of claim 10, whereindetermining whether the timing information is signaled in the videousability information of the video parameter set includes determining avalue of a first flag in the video usability information, the first flagindicating whether the timing information is signaled in the videousability information.
 14. The apparatus of claim 13, wherein theprocessor is configured to determine a value of a second flag in thevideo usability information based on the value of the first flag, thesecond flag defining whether hypothetical reference decoder parametersare signaled in the video usability information.
 15. The apparatus ofclaim 14, wherein the processor is configured to provide, in the videousability information, one or more syntax elements for signalinginformation related to the encoded video bitstream, the informationincluding a condition that the value of the second flag is dependent onthe value of the first flag.
 16. The apparatus of claim 14, wherein theprocessor is configured to provide, in the video usability information,one or more syntax elements for signaling information related to theencoded video bitstream, the information including a constraint that thevalue of the second flag is to be set to zero when the value of thefirst flag is equal to zero.
 17. The apparatus of claim 10, wherein theapparatus is a wireless communication device, further comprising: atransmitter configured to transmit the encoded video bitstream includingthe video parameter set.
 18. The apparatus of claim 17, wherein thewireless communication device is a cellular telephone and the encodedvideo bitstream is modulated according to a cellular communicationstandard.
 19. A computer readable medium having stored thereoninstructions that when executed by a processor perform a method,including: generating an encoded video bitstream comprising multiplelayers, the encoded video bitstream including a video parameter setdefining parameters of the encoded video bitstream, wherein the videoparameter set includes video usability information; determining whethertiming information is signaled in the video usability information of thevideo parameter set; and determining whether to signal hypotheticalreference decoder parameters in the video usability information of thevideo parameter set based on whether timing information is signaled inthe video usability information.
 20. The computer readable medium ofclaim 19, further comprising signaling the hypothetical referencedecoder parameters in the video usability information when timinginformation is signaled in the video usability information.
 21. Thecomputer readable medium of claim 19, further comprising not signalingthe hypothetical reference decoder parameters in the video usabilityinformation when timing information is not signaled in the videousability information.
 22. The computer readable medium of claim 19,wherein determining whether the timing information is signaled in thevideo usability information of the video parameter set includesdetermining a value of a first flag in the video usability information,the first flag indicating whether the timing information is signaled inthe video usability information.
 23. The computer readable medium ofclaim 22, further comprising determining a value of a second flag in thevideo usability information based on the value of the first flag, thesecond flag defining whether hypothetical reference decoder parametersare signaled in the video usability information.
 24. An apparatuscomprising: means for generating an encoded video bitstream comprisingmultiple layers, the encoded video bitstream including a video parameterset defining parameters of the encoded video bitstream, wherein thevideo parameter set includes video usability information; means fordetermining whether timing information is signaled in the videousability information of the video parameter set; and means fordetermining whether to signal hypothetical reference decoder parametersin the video usability information of the video parameter set based onwhether timing information is signaled in the video usabilityinformation.
 25. The apparatus of claim 24, wherein the hypotheticalreference decoder parameters are signaled in the video usabilityinformation when timing information is signaled in the video usabilityinformation.
 26. The apparatus of claim 24, wherein the hypotheticalreference decoder parameters are not signaled in the video usabilityinformation when timing information is not signaled in the videousability information.
 27. The apparatus of claim 24, whereindetermining whether the timing information is signaled in the videousability information of the video parameter set includes determining avalue of a first flag in the video usability information, the first flagindicating whether the timing information is signaled in the videousability information.
 28. The apparatus of claim 27, further comprisingmeans for determining a value of a second flag in the video usabilityinformation based on the value of the first flag, the second flagdefining whether hypothetical reference decoder parameters are signaledin the video usability information.
 29. The apparatus of claim 28,further comprising means for providing, in the video usabilityinformation, one or more syntax elements for signaling informationrelated to the encoded video bitstream, the information including acondition that the value of the second flag is dependent on the value ofthe first flag.
 30. The apparatus of claim 28, further comprising meansfor providing, in the video usability information, one or more syntaxelements for signaling information related to the encoded videobitstream, the information including a constraint that the value of thesecond flag is to be set to zero when the value of the first flag isequal to zero.